CN112575388A - Single-molecule target gene library building method and kit thereof - Google Patents
Single-molecule target gene library building method and kit thereof Download PDFInfo
- Publication number
- CN112575388A CN112575388A CN202011533112.5A CN202011533112A CN112575388A CN 112575388 A CN112575388 A CN 112575388A CN 202011533112 A CN202011533112 A CN 202011533112A CN 112575388 A CN112575388 A CN 112575388A
- Authority
- CN
- China
- Prior art keywords
- sequencing
- sequence
- adaptor
- dna
- series
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 62
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000012163 sequencing technique Methods 0.000 claims abstract description 180
- 239000000523 sample Substances 0.000 claims abstract description 108
- 238000006243 chemical reaction Methods 0.000 claims abstract description 57
- 238000010276 construction Methods 0.000 claims abstract description 55
- 108020004414 DNA Proteins 0.000 claims abstract description 36
- 238000003199 nucleic acid amplification method Methods 0.000 claims abstract description 31
- 230000003321 amplification Effects 0.000 claims abstract description 30
- 102000053602 DNA Human genes 0.000 claims abstract description 25
- 230000000295 complement effect Effects 0.000 claims abstract description 22
- 108020004682 Single-Stranded DNA Proteins 0.000 claims abstract description 14
- 238000002844 melting Methods 0.000 claims abstract description 8
- 230000008018 melting Effects 0.000 claims abstract description 8
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 6
- 238000002156 mixing Methods 0.000 claims abstract description 6
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 6
- 239000002773 nucleotide Substances 0.000 claims description 18
- 125000003729 nucleotide group Chemical group 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 15
- 125000002467 phosphate group Chemical group [H]OP(=O)(O[H])O[*] 0.000 claims description 7
- 210000001124 body fluid Anatomy 0.000 claims description 5
- 238000004925 denaturation Methods 0.000 claims description 5
- 230000036425 denaturation Effects 0.000 claims description 5
- LSNNMFCWUKXFEE-UHFFFAOYSA-M Bisulfite Chemical compound OS([O-])=O LSNNMFCWUKXFEE-UHFFFAOYSA-M 0.000 claims description 4
- 102000012410 DNA Ligases Human genes 0.000 claims description 4
- 108010061982 DNA Ligases Proteins 0.000 claims description 4
- 239000010839 body fluid Substances 0.000 claims description 4
- 239000002299 complementary DNA Substances 0.000 claims description 4
- 238000003505 heat denaturation Methods 0.000 claims description 4
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims description 3
- 238000010309 melting process Methods 0.000 claims description 3
- 238000010839 reverse transcription Methods 0.000 claims description 3
- 102000003960 Ligases Human genes 0.000 claims description 2
- 108090000364 Ligases Proteins 0.000 claims description 2
- 238000009412 basement excavation Methods 0.000 claims description 2
- 238000010438 heat treatment Methods 0.000 claims description 2
- WSFSSNUMVMOOMR-UHFFFAOYSA-N Formaldehyde Chemical compound O=C WSFSSNUMVMOOMR-UHFFFAOYSA-N 0.000 claims 4
- 239000012188 paraffin wax Substances 0.000 claims 2
- 239000000047 product Substances 0.000 description 56
- 238000001514 detection method Methods 0.000 description 22
- 239000011324 bead Substances 0.000 description 21
- 230000036438 mutation frequency Effects 0.000 description 19
- 230000000052 comparative effect Effects 0.000 description 18
- 206010028980 Neoplasm Diseases 0.000 description 17
- 238000005516 engineering process Methods 0.000 description 16
- 108091093088 Amplicon Proteins 0.000 description 13
- 239000000203 mixture Substances 0.000 description 13
- 238000002474 experimental method Methods 0.000 description 11
- 238000007481 next generation sequencing Methods 0.000 description 10
- 239000012634 fragment Substances 0.000 description 9
- 230000035772 mutation Effects 0.000 description 9
- 239000000126 substance Substances 0.000 description 9
- 238000009396 hybridization Methods 0.000 description 8
- 238000010561 standard procedure Methods 0.000 description 7
- 101150033839 4 gene Proteins 0.000 description 6
- 238000000137 annealing Methods 0.000 description 6
- 230000027455 binding Effects 0.000 description 6
- 239000003153 chemical reaction reagent Substances 0.000 description 6
- 238000012165 high-throughput sequencing Methods 0.000 description 6
- 238000007403 mPCR Methods 0.000 description 6
- 238000000746 purification Methods 0.000 description 6
- 229910021642 ultra pure water Inorganic materials 0.000 description 6
- 239000012498 ultrapure water Substances 0.000 description 6
- 230000015556 catabolic process Effects 0.000 description 5
- 239000007795 chemical reaction product Substances 0.000 description 5
- 238000006731 degradation reaction Methods 0.000 description 5
- 238000013467 fragmentation Methods 0.000 description 5
- 238000006062 fragmentation reaction Methods 0.000 description 5
- 238000002360 preparation method Methods 0.000 description 5
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 4
- GUAHPAJOXVYFON-ZETCQYMHSA-N (8S)-8-amino-7-oxononanoic acid zwitterion Chemical group C[C@H](N)C(=O)CCCCCC(O)=O GUAHPAJOXVYFON-ZETCQYMHSA-N 0.000 description 4
- 206010064571 Gene mutation Diseases 0.000 description 4
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 4
- 108010090804 Streptavidin Proteins 0.000 description 4
- 239000007984 Tris EDTA buffer Substances 0.000 description 4
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 4
- 201000005202 lung cancer Diseases 0.000 description 4
- 208000020816 lung neoplasm Diseases 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical group [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 3
- 238000007400 DNA extraction Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- ANRHNWWPFJCPAZ-UHFFFAOYSA-M thionine Chemical compound [Cl-].C1=CC(N)=CC2=[S+]C3=CC(N)=CC=C3N=C21 ANRHNWWPFJCPAZ-UHFFFAOYSA-M 0.000 description 3
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 2
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 2
- 108091034117 Oligonucleotide Proteins 0.000 description 2
- 241000700605 Viruses Species 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 229960002685 biotin Drugs 0.000 description 2
- 235000020958 biotin Nutrition 0.000 description 2
- 239000011616 biotin Substances 0.000 description 2
- 239000000872 buffer Substances 0.000 description 2
- 238000003776 cleavage reaction Methods 0.000 description 2
- 229940104302 cytosine Drugs 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000002255 enzymatic effect Effects 0.000 description 2
- 239000012467 final product Substances 0.000 description 2
- 230000004907 flux Effects 0.000 description 2
- 238000004128 high performance liquid chromatography Methods 0.000 description 2
- 230000001939 inductive effect Effects 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 239000011259 mixed solution Substances 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000009871 nonspecific binding Effects 0.000 description 2
- 102000039446 nucleic acids Human genes 0.000 description 2
- 108020004707 nucleic acids Proteins 0.000 description 2
- 150000007523 nucleic acids Chemical class 0.000 description 2
- 102000040430 polynucleotide Human genes 0.000 description 2
- 108091033319 polynucleotide Proteins 0.000 description 2
- 239000002157 polynucleotide Substances 0.000 description 2
- 238000003908 quality control method Methods 0.000 description 2
- 230000007017 scission Effects 0.000 description 2
- 238000011451 sequencing strategy Methods 0.000 description 2
- 125000006850 spacer group Chemical group 0.000 description 2
- LDHYTBAFXANWKM-UHFFFAOYSA-N 2-amino-3,7-dihydropurin-6-one Chemical compound O=C1NC(N)=NC2=C1NC=N2.O=C1NC(N)=NC2=C1N=CN2 LDHYTBAFXANWKM-UHFFFAOYSA-N 0.000 description 1
- 102000004594 DNA Polymerase I Human genes 0.000 description 1
- 108010017826 DNA Polymerase I Proteins 0.000 description 1
- 230000007067 DNA methylation Effects 0.000 description 1
- 241000508725 Elymus repens Species 0.000 description 1
- 102100030708 GTPase KRas Human genes 0.000 description 1
- 102100039788 GTPase NRas Human genes 0.000 description 1
- 101000584612 Homo sapiens GTPase KRas Proteins 0.000 description 1
- 101000744505 Homo sapiens GTPase NRas Proteins 0.000 description 1
- 101000605639 Homo sapiens Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform Proteins 0.000 description 1
- 102100038332 Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform Human genes 0.000 description 1
- 208000002151 Pleural effusion Diseases 0.000 description 1
- 101710086015 RNA ligase Proteins 0.000 description 1
- 239000013614 RNA sample Substances 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000006209 dephosphorylation reaction Methods 0.000 description 1
- 239000012153 distilled water Substances 0.000 description 1
- 238000010828 elution Methods 0.000 description 1
- 238000006911 enzymatic reaction Methods 0.000 description 1
- 102000052116 epidermal growth factor receptor activity proteins Human genes 0.000 description 1
- 108700015053 epidermal growth factor receptor activity proteins Proteins 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000011987 methylation Effects 0.000 description 1
- 238000007069 methylation reaction Methods 0.000 description 1
- 238000012164 methylation sequencing Methods 0.000 description 1
- YOHYSYJDKVYCJI-UHFFFAOYSA-N n-[3-[[6-[3-(trifluoromethyl)anilino]pyrimidin-4-yl]amino]phenyl]cyclopropanecarboxamide Chemical compound FC(F)(F)C1=CC=CC(NC=2N=CN=C(NC=3C=C(NC(=O)C4CC4)C=CC=3)C=2)=C1 YOHYSYJDKVYCJI-UHFFFAOYSA-N 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Chemical compound O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B50/00—Methods of creating libraries, e.g. combinatorial synthesis
- C40B50/06—Biochemical methods, e.g. using enzymes or whole viable microorganisms
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B40/00—Libraries per se, e.g. arrays, mixtures
- C40B40/04—Libraries containing only organic compounds
- C40B40/06—Libraries containing nucleotides or polynucleotides, or derivatives thereof
Landscapes
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biochemistry (AREA)
- Health & Medical Sciences (AREA)
- Chemical Kinetics & Catalysis (AREA)
- General Chemical & Material Sciences (AREA)
- Medicinal Chemistry (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
A single molecule target gene library building method and a kit thereof, wherein the method comprises the following steps: the extension step comprises the steps of mixing a template molecule and a target probe which is connected with a first sequencing joint in series, and combining the target probe to a target area of the template molecule and extending to obtain a target probe extension product; a second sequencing joint connection step, which comprises adding a second sequencing joint, wherein the second sequencing joint is provided with a complementary paired forward chain and a complementary paired reverse chain, the 5 ' end of the forward chain can be connected in series to the 3 ' end of the extension product of the target probe, the 3 ' end of the reverse chain is connected in series with a random sequence, and a second sequencing joint connection product is obtained through reaction; performing melting treatment, namely removing single-chain molecules which are connected with random sequences in series in the product to obtain a single-chain product connected with a first sequencing joint in series; and a double-strand synthesis step, which comprises adding a first primer and a second primer, and reacting to obtain an amplification product. The invention integrates the library construction and the target gene enrichment, and is widely suitable for single-stranded or double-stranded DNA samples with various lengths and high and low initial quantities.
Description
Technical Field
The invention relates to the technical field of gene sequencing, in particular to a single-molecule target gene library building method and a kit thereof.
Background
The existing target sequencing sample preparation of Next Generation Sequencing (NGS) needs four steps of library building, amplification before capture, hybridization capture and amplification after capture, the steps are connected in series but are not optional, the whole process needs about 2 to 3 days, time and money are wasted, and the steps are linked with each other with certain difficulty. Furthermore, the initial DNA needs to be pre-processed before interruption, and library fragment length screening is carried out after library construction. Therefore, the existing database construction technology is difficult to aim at severely degraded samples or trace DNA samples (generally, the DNA quantity is less than 20ng, namely, the database construction quality is difficult to guarantee). In addition, the two rounds of amplification before and after capture are exponential amplification, which brings a great amount of errors and preference, and causes an excessively high technical error background, so that the detection of low-frequency (less than one in a thousand) gene mutation cannot be carried out.
Disclosure of Invention
According to a first aspect, there is provided in one embodiment a single molecule target gene library-building method comprising:
the extension step comprises the steps of mixing a template molecule and a target probe connected with a first sequencing joint in series, wherein the target probe is combined to a target area of the template molecule in a targeted mode and is subjected to extension reaction to obtain a target probe extension product;
a second sequencing joint connecting step, which comprises adding a second sequencing joint into the reaction system obtained in the extending step, wherein the second sequencing joint contains a complementary paired forward chain and a reverse chain, the 5 ' end of the forward chain of the second sequencing joint can be connected in series to the 3 ' end of the target probe extension product, the 3 ' end of the reverse chain is connected in series with a random nucleotide sequence, reacting to obtain a second sequencing joint connecting product, then melting, removing a single chain in which a random nucleotide sequence is connected in series in the product, and obtaining a single chain product in series with the first sequencing joint;
and a double-strand synthesis step, wherein a first primer and a second primer are added into the single-strand product connected in series with a first sequencing joint, and the reaction is carried out to obtain an amplification product for on-machine sequencing, wherein the first primer contains a sequence which is complementarily paired with the first sequencing joint, and the second primer contains a sequence which is complementarily paired with the second sequencing joint.
According to a second aspect, an embodiment provides a library constructed by the method of the first aspect.
According to a third aspect, there is provided in one embodiment a kit comprising a first sequencing adaptor having a target probe attached thereto in series, the target probe being capable of binding to a target region of a template molecule and extending a reaction, a second sequencing adaptor having complementary paired forward and reverse strands, the forward strand of the second sequencing adaptor having its 5 ' end serially connectable to the 3 ' end of a target probe extension product and the reverse strand having its 3 ' end serially connected to a random nucleotide sequence.
According to the single-molecule target gene library construction method and the kit thereof, the library construction step and the targeted gene enrichment step are combined, fragmentation treatment is not needed, the time required by library construction is effectively shortened, and the library construction efficiency is improved. The method or the kit can be suitable for single-stranded or double-stranded DNA samples with various lengths, cDNA reverse-transcribed from RNA, DNA treated by bisulfite (used for DNA methylation sequencing), various seriously degraded DNA (DNA extracted from formalin-fixed and paraffin-embedded (FFPE) tissues or forensic samples, free DNA (cfDNA) contained in body fluid, and the like), and the application range of the initial amount is wide (0.1 to 1000 nanograms).
Drawings
FIG. 1 is a flow diagram of a library building in one embodiment;
FIG. 2 is a diagram of a second sequencing adapter in one embodiment.
Detailed Description
The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.
Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.
The numbering of the components as such, e.g., "first", "second", etc., is used herein only to distinguish the objects as described, and does not have any sequential or technical meaning.
Herein, high throughput sequencing sample DNA library construction, referred to as library construction for short. The conventional library building technical process comprises the following steps: the ends of double-stranded DNA molecules are flattened through a series of enzymatic reactions, and then a first sequencing adaptor and a second sequencing adaptor of double strands are respectively connected to the two ends of the DNA molecules.
As used herein, "μ M" refers to μmol/L in units of concentration, and Chinese is micromoles per liter.
Hair pin structure (also can be expressed as hairpin structure) means: a specific spatial structure formed by a pair of inverted repeat folding pairs.
In the prior art, the preparation process of the NGS targeted library (including single-chain library construction by a novel library construction method) is generally divided into two sets of processes, the mainstream capture library construction method needs four necessary steps of library construction, amplification before capture, hybridization capture and amplification after capture, and the whole process generally consumes 2 to 3 days. Another common method is amplicon pooling, which is generally performed by first performing multiplex PCR and then pooling the PCR products, and some commercial kits add linker sequences corresponding to the NGS platform outside the 5' end of the primers during multiplex PCR to integrate the two steps into one step.
The first main technical route is to strictly separate library construction and hybridization capture, which has many steps and long period, and depends on magnetic bead capture based on streptavidin and biotin connection, and the magnetic bead is expensive and depends on import. Although the second technical route is simpler than the former one, because it is based on multiplex PCR, there are the following problems: 1. the initial investment for building the warehouse is high; 2. the number of target sites (plex number) in the same reaction system cannot be too large, so that the gene detection of a large probe library (panel) is difficult to finish through single-tube reaction, the gene detection can only be divided into a plurality of single-tube reactions, and then products are combined to realize, so that the cost and the operation time are greatly increased, the single-tube reaction detection flux is limited, and the popularization is not facilitated; 3, PCR requires primer pairing at two ends, so that structural variation such as unknown gene fusion (novel fusion) and virus insertion sites cannot be detected; 4, exponential amplification of PCR leads to undetectable gene copy number variation; 5. multiplex PCR inevitably results in low uniformity due to poor amplification preference, resulting in poor coverage of part of the region targeted by the panel and excessive coverage of part of the region.
In comparison, in some embodiments, the two steps of library construction and target gene enrichment are integrated into one process, and the revolutionary innovation not only overcomes the defects of multiple steps and high cost of the mainstream library construction and hybrid capture process, but also overcomes the defects of small single-tube reaction detection flux, poor uniformity, incapability of effectively detecting genome structural variation and gene copy number variation and the like inherent in amplicon library construction through linear amplification.
In some embodiments, because the P5 end of the created library carries molecular tags, PCR and sequencing errors can be effectively corrected, and therefore ultralow frequency detection is realized.
In some embodiments, the present invention is also advantageous for use with a wide variety of DNA samples that are not amenable to severe degradation and trace amounts of these conventional NGS techniques, as are many samples used in clinical testing applications, such as DNA extracted from FFPE (formalin-fixed and paraffin-embedded) samples, extracellular free DNA from bodily fluids (plasma, pleural effusion, urine, etc.), and the like. And the method has no requirement on the sample length, does not need to break the complete genome DNA of the long fragment, and saves time and cost.
In some embodiments, the starting amount of sample suitable for use in the present invention is between 0.1 and 1000ng, especially for low starting amount samples.
In a first aspect, in some embodiments, there is provided a single molecule target gene library-building method comprising:
the extension step comprises the steps of mixing a template molecule and a target probe which is connected with a first sequencing joint in series, wherein the target probe is combined to a target area of the template molecule in a targeted mode and is subjected to extension reaction to obtain a target probe extension product;
a second sequencing joint connecting step, which comprises adding a second sequencing joint into the reaction system obtained in the extending step, wherein the second sequencing joint contains a complementary paired forward chain and a reverse chain, the 5 ' end of the forward chain of the second sequencing joint can be connected in series to the 3 ' end of the target probe extension product, the 3 ' end of the reverse chain is connected in series with a random nucleotide sequence, reacting to obtain a second sequencing joint connecting product, then melting, removing a single chain in which a random nucleotide sequence is connected in series in the product, and obtaining a single chain product in series with the first sequencing joint;
and a double-strand synthesis step, wherein a first primer and a second primer are added into the single-strand product connected in series with a first sequencing joint, and the reaction is carried out to obtain an amplification product for on-machine sequencing, wherein the first primer contains a sequence which is complementarily paired with the first sequencing joint, and the second primer contains a sequence which is complementarily paired with the second sequencing joint.
In some embodiments, the template molecule is single-stranded DNA and the target probe extension product is also single-stranded DNA.
In some embodiments, the template molecule can be any type of DNA molecule, including but not limited to single-stranded DNA, single-stranded DNA obtained after melting processing of double-stranded DNA, and the initial amount is applicable to a wide range (0.1 to 1000 ng), and can be applied to DNA molecules of various lengths, thereby omitting the step of breaking the DNA sample before the conventional library construction technology (i.e., breaking the long-fragment DNA molecule to about 300bp in length by various physical or chemical methods, which would otherwise reduce the library construction efficiency, and making the subsequent high-throughput sequencing unable to sequence the DNA molecule).
In some embodiments, the template molecule is derived from at least one of bisulfite-treated DNA, DNA of formalin-fixed and paraffin-embedded (FFPE) tissue, DNA extracted from forensic samples, free DNA (cfdna) contained in body fluids, DNA samples extracted from biological remains of ancient fossils or archaeological excavation, and the like, and in the case of double-stranded DNA samples, the template molecule can be dissociated into single strands, typically by thermal denaturation or the like, to give the template molecule.
In some embodiments, the method of melting the double-stranded DNA may be the same as the melting process in the second sequencing adaptor ligation step, and may typically be a heat denaturation process.
The DNA sample applicable to the invention has wide types, can be samples which can not be competent by the conventional NGS library construction technology such as serious degradation and/or trace samples, and the like, has no requirement on the length of template molecules in the samples, and has no need of interrupting the complete genome DNA of long fragments.
In some embodiments, the 5' end of the forward strand of the second sequencing linker is modified with a phosphate group.
In some embodiments, a molecular tag is concatemeric between the first sequencing adapter and the target probe.
In some embodiments, the molecular tag is a random nucleotide sequence.
In some embodiments, the molecular tag is 4-19nt in length. Specifically, the number of the first and second groups may be 4nt, 5nt, 6nt, 7nt, 8nt, 9nt, 10nt, 11nt, 12nt, 13nt, 14nt, 15nt, 16nt, 17nt, 18nt, 19nt, and the like.
In some embodiments, the length of the random nucleotide sequence concatenated at the 3' end of the reverse strand is 5-15nt, and specifically may be 5nt, 6nt, 7nt, 8nt, 9nt, 10nt, 11nt, 12nt, 13nt, 14nt, 15nt, and the like.
In some embodiments, the first primer contains a sequence that is complementarily paired with the first sequencing adapter and the second primer contains a sequence that is complementarily paired with the second sequencing adapter.
In some embodiments, the first primer comprises an inner adaptor sequence, an outer adaptor sequence, the 5 'end of the inner adaptor sequence being linked in series to the 3' end of the outer adaptor sequence, the inner adaptor sequence being reverse complementary mateable to the first sequencing adaptor.
In some embodiments, the first primer contains or does not contain a first sample tag.
In some embodiments, when the first primer comprises a first sample tag, the first sample tag is located between the inner and outer adaptor sequences.
In some embodiments, the length of the first sample tag is 4nt to 15nt, and specifically may be 4nt, 5nt, 6nt, 7nt, 8nt, 9nt, 10nt, 11nt, 12nt, 13nt, 14nt, 15nt, and so on.
In some embodiments, the second primer comprises an inner adaptor sequence, an outer adaptor sequence, the 5 'end of the inner adaptor sequence being linked in series to the 3' end of the outer adaptor sequence, the inner adaptor sequence being reverse complementary mateable with the second sequencing adaptor.
In some embodiments, the second primer contains or does not contain a second sample tag.
In some embodiments, when the second primer comprises a second sample tag, the second sample tag is located between the inner and outer adaptor sequences.
In some embodiments, the length of the second sample label is 4-15nt, and specifically may be 4nt, 5nt, 6nt, 7nt, 8nt, 9nt, 10nt, 11nt, 12nt, 13nt, 14nt, 15nt, and so on.
In some embodiments, the number of amplification cycles for the extension reaction is ≧ 10. The number of times of the cyclic reaction is not limited and can be selected as required.
In some embodiments, the number of amplification cycles for the extension reaction in the extension step is 10-500 cycles.
In some embodiments, in the extending step, each cycle is reacted as follows: 94-98 ℃ for 10-60 seconds; 55-65 ℃ for 10-60 seconds; 68-72 ℃ for 10-60 seconds.
In some embodiments, the extension step further comprises, after obtaining the target probe extension product, subjecting the obtained target probe extension product to a purification treatment.
In some embodiments, the extended product is treated with magnetic bead purification. Magnetic beads for purification are commercially available, and may be obtained, for example and without limitation, from Nanjing Novophilia Biotech, Inc.
In some embodiments, in the second sequencing linker ligation step, the ligation reaction is performed, specifically at 22-40 ℃ for 0.5-2 hours.
In some embodiments, in the second sequencing linker ligation step, the ligase employed includes, but is not limited to, T4 DNA ligase. T4 DNA ligase is commercially available.
In some embodiments, in the second sequencing linker ligation step, the melting process may generally be a denaturation process.
In some embodiments, the denaturing treatment may generally be a heat denaturing treatment.
In some embodiments, the heat denaturation treatment may specifically be heating the target molecule to at least 80 ℃ for at least 1 min.
In some embodiments, the heat denaturation may be performed at 80-98 deg.C for 1-30min to obtain single-stranded template molecules.
In some embodiments, the first sequencing linker includes, but is not limited to, any one of a P5-terminal sequencing linker of the Illumina sequencing platform, a P2-terminal sequencing linker of the MGI sequencing platform.
In some embodiments, the second sequencing linker includes, but is not limited to, any of a P7-terminal sequencing linker of the Illumina sequencing platform, a P1-terminal sequencing linker of the MGI sequencing platform.
In some embodiments, the linker of the second sequencing linker is shown in fig. 2, and has a complementary pair of a forward strand and a reverse strand, the 5 'end of the forward strand is modified with a phosphate group, and the 3' end of the reactive strand is connected in series with a random nucleotide sequence, which may be 5-15 nt. The 5' end of the reverse strand is unmodified with a phosphate group.
In a second aspect, in some embodiments, there is provided a library constructed using the method of the first aspect.
In some embodiments, the library construction and the targeted gene enrichment are combined, so that the experimental steps are effectively shortened, library construction, amplification, capture and re-amplification are not required, the material consumption is remarkably reduced, and the experimental time is shortened.
In some embodiments, the second sequencing adapter is a hairpin adapter, without biotin, and does not require streptavidin magnetic bead capture (streptavidin magnetic beads are expensive to manufacture).
In some embodiments, the number of sites can be from 1 to 1 thousand for each single-tube reaction, with each site corresponding to a primer with a specific target gene binding region.
In a third aspect, in some embodiments, there is provided a kit comprising a first sequencing adaptor having a target probe linked in series, the target probe being capable of binding to a target region of a template molecule and extending a reaction, a second sequencing adaptor having a complementary pair of a forward strand and a reverse strand, the 5 ' end of the forward strand of the second sequencing adaptor being capable of linking in series to the 3 ' end of a target probe extension product, and the 3 ' end of the reverse strand being linked in series to a random nucleotide sequence.
In some embodiments, the template molecule is single-stranded DNA and the target probe extension product is also single-stranded DNA.
In some embodiments, the 5' end of the forward strand of the second sequencing linker is modified with a phosphate group.
In some embodiments, a molecular tag is concatemeric between the first sequencing adapter and the target probe;
in some embodiments, the molecular tag is 4-19nt in length.
In some embodiments, the random nucleotide sequence concatenated at the 3' end of the reverse strand of the second sequencing linker is 5-15nt in length.
In some embodiments, the kit further comprises a first primer comprising a sequence that is complementarily pairable with the first sequencing adapter, a second primer comprising a sequence that is complementarily pairable with the second sequencing adapter.
In some embodiments, the first primer comprises an inner adaptor sequence, an outer adaptor sequence, the 5 'end of the inner adaptor sequence being linked in series to the 3' end of the outer adaptor sequence, the inner adaptor sequence being reverse complementary mateable to the first sequencing adaptor.
In some embodiments, the first primer contains or does not contain a first sample tag.
In some embodiments, when the first primer comprises a first sample tag, the first sample tag is located between the inner and outer adaptor sequences.
In some embodiments, the length of the first sample tag is 4nt to 15nt, and specifically may be 4nt, 5nt, 6nt, 7nt, 8nt, 9nt, 10nt, 11nt, 12nt, 13nt, 14nt, 15nt, and so on.
In some embodiments, the second primer comprises an inner adaptor sequence, an outer adaptor sequence, the 5 'end of the inner adaptor sequence being linked in series to the 3' end of the outer adaptor sequence, the inner adaptor sequence being reverse complementary mateable with the second sequencing adaptor.
In some embodiments, the second primer contains or does not contain a second sample tag.
In some embodiments, when the second primer comprises a second sample tag, the second sample tag is located between the inner and outer adaptor sequences.
In some embodiments, the length of the second sample label is 4-15nt, and specifically may be 4nt, 5nt, 6nt, 7nt, 8nt, 9nt, 10nt, 11nt, 12nt, 13nt, 14nt, 15nt, and so on.
In some embodiments, the first sequencing linker includes, but is not limited to, any one of a P5-terminal sequencing linker of the Illumina sequencing platform, a P2-terminal sequencing linker of the MGI sequencing platform.
In some embodiments, the second sequencing linker includes, but is not limited to, any of a P7-terminal sequencing linker of the Illumina sequencing platform, a P1-terminal sequencing linker of the MGI sequencing platform.
The following examples take a library of an international commonly used Illumina sequencing platform as an example, but the library can be compatible with other NGS platforms, and only the joint of the corresponding sequencing platform needs to be replaced.
Example 1
Preparing a mutant free nucleic acid (cfDNA) standard substance with the mutation frequency of three ten-thousandth, taking three equal parts of 30ng of the cfDNA standard substance, respectively using the three independent library construction experiments, respectively using the method (embodiment 1) of the embodiment, the existing hybridization capture library construction method (comparative example 1) and the existing amplicon library construction method (comparative example 2) to prepare libraries for on-machine sequencing, wherein target gene regions designed by the three groups of experiments are basically consistent, then performing on-machine sequencing on two identical high-throughput sequencing platforms, sequencing the same data volume, and finally using the same data analysis process to check the mutation detection conditions of the same 8 target gene sites so as to evaluate the performance difference of the three high-throughput sequencing target gene library construction methods.
The standard substance of this example was purchased from Jinglian Genesis technologies (Shenzhen) Limited, specifically, lung cancer ctDNA standard substance suite GW-OCTM009, which contains wild-type DNA standard substance and ctDNA standard substance with mutation frequency of 0.1%, and the two were made to be in the following ratio of 7: 3 to obtain a diluted standard substance with mutation frequency of 0.03%.
The target detection sites are shown in table 1.
TABLE 1
All required oligomers (oligos) are shown in tables 2 and 3 below (synthesized by tsinggis biotechnology limited, tokyo, HPLC purification).
TABLE 2
TABLE 3
The structure of the target gene probe (IS1-UMI-GSP) with the first sequencing linker IS illustrated as follows:
"ACACTCTTTCCCTACACGACGCTCTTCCGATCT" is the first sequencing linker, "NNNNNNNNN" is the molecular tag, "xxxxxxxxxxxxxxxxxx" is the sequence that is complementary paired to the target gene region.
In the structure of P7-index-1, the sequence labeled with a downward straight line, "TGATAG", is a sample label.
The symbols in tables 2 and 3 are described below: (1) IS2revcomp-sp-Pho and IS 2-spline anneal to hairpin adapters, the second sequencing adapters. A mixture containing the probe pool shown in Table 3 was named Rui Fa 4 gene panel.
(2) "N" represents a random base.
(3) "X" represents a sequence that is complementary paired to the target gene region, 20 nucleotides in length, and is arranged one at a time every 10 nucleotides onwards of the target gene region, i.e., 2 × tile coverage.
(4) "Pho" represents a phosphate group.
(5) "" indicates a disulfide bond to strengthen the linkage between nucleotides and prevent degradation of the polynucleotide. .
(6) "spacer C12" indicates 12 empty carbon backbones to prevent non-specific binding of primers.
(6) In the second sequencing linker reverse strand, "spacer c 12" represents 12 empty carbon backbones to prevent non-specific binding of primers.
(7) In the reverse strand of the second sequencing linker, "AmC 6" indicates an amino modification at carbon position 6 to block the 3' end of the polynucleotide.
The reagents and instrumentation are described below:
1. t4 DNA fragment (Rapid) (Cat: N103-01) (available from NyVo Nuo Zan Biotech Co., Ltd.) was used for each ligation reaction.
2. The library Amplification reaction was performed using VAHTS HiFi Amplification Mix (cat # N616-01) (available from Nanjing NuoZan Biotechnology Ltd.).
3.PCR product purification magnetic Beads VAHTS DNA Clean Beads (cat # N411-01) (available from Nanjing NuoWei Zan Biotech Co., Ltd.).
4. The control group used an International Universal methylation library kit (for illumina) (available from Swift Biosciences, USA, Catalog No. 30024).
5. The primer extension complementary to the single-stranded linker in the reverse direction was performed using DNA polymerase I Klenow fragment (cat # N104-01) (available from Nanjing Novowed Biotech Co., Ltd.).
6. T4 RNA ligase buffer (10X) and FastAP (1U/. mu.L) required for DNA sample dephosphorylation reaction were respectively adopted under product No. B0216L of NEB Limited and product No. EF0651 of Enwei Jie (Shanghai) trade Limited.
7. Dynabeads from streptavidin magnetic beads for binding single stranded ligation productsTM MyOneTMStreptavidin C1 (Yinxie Jie (Shanghai) trade Co., Ltd., Cat number 65001).
8. Ultra pure water used in each step of experiment is ULTRAPUreTMDNase/RNase-Free Distilled Water (Yinxie Jie Co., Ltd., Cat. 10977023).
9. The instrument comprises ABI veriti96 type PCR instrument (product of Yinxie Jie (Shanghai) trade Co., Ltd.), constant-temperature mixing instrument (Hangzhou Youning, Cat. HC-100), four-dimensional rotating mixing instrument (BE-1100, Nihlin Beier instruments manufacturing Co., Ltd., Haimen), magnetic frame (Wuxi Baige Biotechnology Limited Biotech, Ltd.), and magnetic frameCompany, cat # BMB16-1.5-2), QubitTM4Fluorometer, with WiFi (Yinxi Weijie trading Limited, cat # Q33238), Bioptic full-automatic multiplex nucleic acid detection System (Hangzhou Kagazei Biotech Limited, cat # Qsep-100), Eppendorf brand pipettor 1000. mu.L range, 100. mu.L range, 10. mu.L range (available from Eppendort, Germany).
The TE buffer composition of this example was as follows: 10mmol/L Tris-HCl, 1mmol/L EDTA, pH 8.0.
As shown in fig. 1, the experimental procedure is as follows:
1. a cyanine fine gene-lung cancer ctDNA standard set-GW-OCTM 009 contains a wild type DNA standard and a ctDNA standard with mutation frequency of 0.1%, and is prepared according to the wild type DNA standard: ctDNA standard with mutation frequency of 0.1% ═ 7: 3 to form a cfDNA sample with a mutation frequency of 0.03%, 30ng, and denatured at 95 ℃ for 2 minutes.
2. Preparation of second sequencing linker
2.1 the following reaction system was placed in a 200. mu.L PCR tube:
TABLE 4
2.2 annealing reaction conditions: 95 ℃ for 10 seconds; RAMP 4% was added and the temperature was reduced to 14 ℃ at a rate of 0.1 ℃/s.
2.3 mu.L of TE buffer was added to the above reaction product system (50. mu.L), and the final concentration of the second sequencing linker in the resulting system was 100. mu.M. The prepared product system can be stored at-20 ℃ for a long time or at 4 ℃ for 8 hours.
3. The target gene probes with the first sequencing adaptors at the 5 'ends in the table 3 are mixed in equal molar numbers to obtain a mixed solution, and the final concentration of the target gene probes with the first sequencing adaptors at the 5' ends in the mixed solution is 200 mu M.
4. Annealing and extending the target gene probe mixture with the first sequencing joint at the 5' end, wherein the annealing and extending are as follows:
for each single-tube reaction, the number of target gene sites to be detected can be from 1 to 1 ten thousand, and each site corresponds to a primer with a specific target gene binding region, so that at most 1 ten thousand probes can be mixed for each single-tube reaction. The number of target detection sites in this example is 8, as shown in table 1.
The following reaction system was arranged in a 200. mu.l PCR tube:
TABLE 5
Components | Volume (μ L) |
Target gene probe mixture with first sequencing joint at 5' |
5 |
0.03% mutation frequency of cfDNA sample 30ng (20 ng/. mu.L) | 1.5 |
Ultrapure water | 18.5 |
VAHTS HiFi Amplification Mix | 25 |
Total volume | 50 |
Vortex and mix evenly and centrifuge briefly, place in PCR instrument and do the following reaction:
the multiplex variable site primer with the 5' end carrying the illumina P5 adaptor and the molecular tag is annealed and extended in a target region of a genome and is carried out in a PCR instrument under the following reaction conditions:
TABLE 6
After the reaction was completed, the product was purified using magnetic Beads of VAHTS DNA Clean Beads according to the standard procedure for purifying PCR products using the magnetic Beads, and the final step was performed by eluting the final product with 20. mu.L of ultrapure water.
5. The following reaction system was arranged in a 200. mu.l PCR tube:
TABLE 7
The total reaction volume was 50. mu.L, and the ligation reaction was performed at 37 ℃ for 1 hour in a PCR apparatus, followed by 1 to 10 minutes at 95 ℃ to denature the ligated product into a single strand, thereby removing the N-carrying complementary strand in the hairpin linker.
6. The following reaction system (illumina inducing PCR) was directly configured in a 200 μ l PCR tube where the second adaptor was connected to the reaction product:
TABLE 8
Components | Volume (μ L) |
Second linker connecting the reaction product | 48 |
P7-index-1 | 0.1 |
P7-index-2 | 0.1 |
P7-index-3 | 0.1 |
P7-index-4 | 0.1 |
P7-index-5 | 0.1 |
P7-index-6 | 0.1 |
P7-index-7 | 0.1 |
P7-index-8 | 0.1 |
P7-index-9 | 0.1 |
P7-index-10 | 0.1 |
IS4_indPCR.P5 | 1 |
VAHTS HiFi Amplification Mix | 50 |
Total volume | 100 |
A100-microliter reaction system was prepared as shown in Table 8 for PCR under the following conditions:
TABLE 9
After the reaction is finished, purifying products by VAHTS DNA Clean Beads according to the standard operation of purifying PCR products by the Beads, and finally eluting final products by 25 microliters of ultrapure water to obtain a P7 terminal sample-labeled illumina target gene library, namely the library capable of being subjected to on-machine sequencing.
Comparative example 1
This comparative example provides a hybridization capture control experiment.
Taking cyanine fine gene-lung cancer ctDNA standard, and sleeving GW-OCTM009 which contains a wild type DNA standard and a ctDNA standard with mutation frequency of 0.1%, wherein the wild type DNA standard is as follows: ctDNA standard with mutation frequency of 0.1% ═ 7: 3 to form a DNA sample with 0.03% mutation frequency, the total mass of the obtained sample is 30ng, the same probe library as that in example 1, namely Rui Fa 4 gene panel, is adopted, and library construction and hybridization capture kit from Nanjing Kingsry Biotechnology GmbH are adopted according to standard operation procedures, and the library construction comprises amplification before capture, hybridization capture and amplification after capture to obtain the library capable of being subjected to on-machine sequencing.
Comparative example 2
This comparative example provides an amplicon banking control experiment (based on multiplex PCR technology).
Taking cyanine fine gene-lung cancer ctDNA standard, and sleeving GW-OCTM009 which contains a wild type DNA standard and a ctDNA standard with mutation frequency of 0.1%, wherein the wild type DNA standard is as follows: ctDNA standard with mutation frequency of 0.1% ═ 7: 3, 30ng of DNA sample with mutation frequency of 0.03%, using the same probe library as in example 1, i.e., Rui Fa 4 gene panel, and using amplicon library-building kit provided by Nanjing Kingsrie Biotech, Inc., performing amplicon library-building according to standard operation procedures to obtain library capable of on-machine sequencing.
Sequencing on machine
The library products prepared in example 1, comparative example 1 and comparative example 2 were taken, the concentration of the library products was measured by using the Qubit4.0, and 20ng of the library products were taken and machine-sequenced. The model of the sequencing instrument is Illumina HiSeq 4000, the sequencing strategy is PE150, and the sample data volume is1 Gb.
Sequencing data quality control and analysis process
Raw data was processed using fastp software, genome Alignment using BWA software (i.e. Burrows-Wheeler-Alignment Tool, algorithm BWA-MEM), reference genome using GRCh38 (also known as hg38, international universal human reference genome sequence) and labeling using sambamba software (markdup).
The analytical results were as follows:
the sequencing results obtained for the library of example 1 are a collection of 10 index resolved reads (reads) as specified in the following table:
watch 10
index No. | Number of reads | Ratio of |
1 | 666670 | 9.99% |
2 | 668281 | 10.01% |
3 | 666578 | 9.99% |
4 | 666694 | 9.99% |
5 | 666646 | 9.99% |
6 | 666765 | 9.99% |
7 | 666473 | 9.98% |
8 | 666657 | 9.99% |
9 | 669152 | 10.02% |
10 | 666349 | 9.98% |
Number of reads that cannot be listed in index | 4673 | 0.07% |
Total reads number | 6674938 | 100.00% |
As can be seen from the above table, the distribution preference of the reads among the indexes is low (the reads split by each index are close), and the reads which cannot be listed in the index account for seven ten-thousandth of the total reads, which indicates that the P7 end-labeled indexing amplification system used in example 1 can accurately perform mixed target gene library construction and sequencing on a plurality of samples.
The mutation detection results are given in the following table:
TABLE 11
In table 11, raw base refers to the raw data amount.
The GC content is the ratio of Guanine (Guanine) to Cytosine (Cytosine).
Q30 represents the ratio of reads to total reads for 99.9% accuracy.
depth in target refers to the sequencing depth of the target site.
ref _ reads represents the corresponding number of reads on the human reference genome.
alt reads represent the number of reads of the mutation (variant).
MAF (mutation Allole frequency) is the frequency of the abrupt change, specifically the ratio of alt reads to ref _ reads.
As can be seen from the above table, the quality of the sequencing data of the library constructed in example 1 is higher than the sequencing results of the libraries constructed in the other two prior arts, specifically, the proportion of Q30 is higher, and Q30 represents the proportion of reads with the accuracy of 99.9% to the total reads; and the frequency of the target gene mutation detected based on the library of example 1 is closer to the true value, i.e., MAF (mutation Allle frequency) is closer to the preset value of three ten-thousandths. Therefore, the library construction method of example 1 has better performance and takes shorter time when sequencing detection is carried out on specific target genes of complex genomes such as human, the hybrid capture library construction of comparative example 1 needs 72-80 hours, the amplicon library construction of comparative example 2 needs 24-32 hours, the library construction method of example 1 only needs 10 hours, and the steps needed by example 1 are few, and various reagents and consumables are few, so the cost is lower. In conclusion, the library construction method of the embodiment 1 has wider application prospect in clinical detection, medical research and genome science research.
Example 2
This example prepares five ten thousandth tumor mutation standard products with mutation frequency from DNA extracted from formalin-fixed and paraffin-embedded (FFPE) tissue standard products (purchased from qian genetic science and technology (shenzhen) limited, specifically, tumor wild-type FFPE standard products and tumor SNV 5% FFPE standard products), takes three equal parts, each 30ng, the DNA standard products are used in three independent library construction experiments, the method and technique, hybrid capture library construction method and amplicon library construction method of this example are respectively used as library preparation methods for sequencing libraries, and the target gene regions designed by the three groups of experiments are basically consistent, then machine-sequences on two identical high-throughput sequencing platforms, and sequence the same data volume, finally, the same data analysis process is used, the same 7 target gene mutation sites are checked (the 7 sites are distributed in the exon regions of 4 genes, the 4 genes are respectively NRAS, KRAS, PIK3CA and EGFR, which are the detection contents of the Rui-method 4 gene panel), so as to evaluate the performance difference of the three high-throughput sequencing target gene library construction technologies (the embodiment, hybrid capture library construction and amplicon library construction).
The DNA standard is purchased from Jingzhen Genetica technology (Shenzhen) Limited, in particular to a tumor wild type FFPE standard (mutation frequency is 0, cargo number GW-OPSM005) and a tumor SNV 5% FFPE standard (cargo number GW-OPSM 003).
The FFPE standard substance is extracted by adopting a magnetic bead method paraffin-embedded tissue DNA extraction kit (product number: D6323-02B) of Guangzhou Meiji biological science and technology Limited.
FFPE Total DNA Fragmentation (i.e., total DNA Fragmentation of long fragments of 10kb or more into short fragments of 200-500 bp) cleavage was performed using the KAPA fragment Kit for Enzymatic Fragmentation (cat No. KK8600) manufactured by KAPA Biosystem, USA.
The target detection sites are shown in the table below.
TABLE 12
All oligomers (oligos) required are shown in tables 13 and 14 below (synthesized by tsingtaury biotechnology limited, tokyo, HPLC purification).
Watch 13
TABLE 14
The reagents and apparatus were the same as in example 1.
The TE buffer composition of this example was as follows: 10mmol/L Tris-HCl, 1mmol/L EDTA, pH 8.0.
The experimental procedure was as follows:
1. total DNA extraction was carried out on a tumor wild type FFPE standard (purchased from Jingchen Gene science and technology (Shenzhen) Limited, with mutation frequency of 0, Cat No. GW-OPSM005) and a tumor SNV 5% FFPE standard (purchased from Jingchen Gene science and technology (Shenzhen) Limited, with Cat No. GW-OPSM003) by using a magnetic bead method paraffin-embedded tissue DNA extraction kit (Cat No. D6323-02B) purchased from Guangzhou Meiji Bioscience and technology Limited, according to the standard operation procedure of the kit, and finally DNA extracts were obtained by elution in a volume of 50. mu.l.
2. The concentration is determined by using the Qubit4. the FFPE DNA concentration of the wild type and the 5% SNV is respectively 15.54 ng/muL and 14.78 ng/muL, the total amount is respectively 777ng and 739ng, the FFPE standard DNA 297ng of the tumor wild type and the FFPE standard DNA 3ng of the tumor SNV 5% are mixed (namely mixed according to the mass ratio of 99 to 1), 300ng of FFPE DNA sample with the mutation frequency of 0.05% is formed, and the mixture is fully mixed by vortex.
3. 30ng of the product from the previous step was taken and put into a 200. mu.l PCR tube, and cleavage was performed using KAPA fragment Kit for Enzymatic Fragmentation, manufactured by KAPA Biosystem, USA.
4. The product of the previous step (still placed in the original 200. mu.l PCR tube) was denatured in a PCR instrument at 95 ℃ for 2 minutes.
5. Preparation of second sequencing linker
5.1 the following reaction system was placed in a 200. mu.L PCR tube:
watch 15
5.2 annealing reaction conditions: 95 ℃ for 10 seconds; RAMP 4% was added and the temperature was reduced to 14 ℃ at a rate of 0.1 ℃/s.
5.3 Add 50. mu.L of TE buffer to the reaction product (50. mu.L) and the final concentration of the second sequencing linker was 100. mu.M.
6. The target gene probes with the first sequencing adaptors at the 5 'ends in table 14 were mixed in equimolar amounts to give a mixture with a final concentration of 200 μ M of the target gene probe with the first sequencing adaptor at each 5' end.
7. Annealing and extending the target gene probe mixture with the first sequencing joint at the 5' end, wherein the annealing and extending are as follows:
for each single-tube reaction, the number of target gene sites to be detected can be from 1 to 1 ten thousand, and each site corresponds to a primer with a specific target gene binding region, so that at most 1 ten thousand probes can be mixed for each single-tube reaction. The number of target detection sites in this example was 7, as shown in table 12.
The following reaction system was arranged in a 200. mu.l PCR tube:
TABLE 16
Vortex and mix evenly and centrifuge briefly, place in PCR instrument and do the following reaction:
the multiplex variable site primer with the 5' end carrying the illumina P5 adaptor and the molecular tag is annealed and extended in a target region of a genome and is carried out in a PCR instrument under the following reaction conditions:
TABLE 17
After the reaction was completed, the product was purified using magnetic Beads of VAHTS DNA Clean Beads according to the standard procedure for purifying PCR products using the magnetic Beads, and the final step was performed by eluting the final product with 20. mu.L of ultrapure water.
8. The following reaction system was arranged in a 200. mu.l PCR tube:
watch 18
Components | Volume (μ L) |
Target gene probe extension product | 19 |
2×Rapid Ligation Buffer | 25 |
Second sequencing Joint (100. mu.M) | 2 |
T4 DNA Ligase(Rapid)(600U/μL) | 4 |
Total volume | 50 |
The total reaction volume was 50. mu.L, and the ligation reaction was performed at 37 ℃ for 1 hour in a PCR apparatus, followed by 1 to 10 minutes at 95 ℃ to denature the ligated product into a single strand, thereby removing the N-carrying complementary strand in the hairpin linker.
9. A reaction system (illumina inducing PCR) shown in Table 8 was directly arranged in a 200. mu.l PCR tube in which the second adaptor was ligated with the reaction product.
A100. mu.l reaction system was prepared as shown in Table 8 and used for PCR under the conditions shown in Table 9.
After the reaction is finished, purifying products by VAHTS DNA Clean Beads according to the standard operation of purifying PCR products by the Beads, and finally eluting final products by 25 microliters of ultrapure water to obtain a P7 terminal sample-labeled illumina target gene library, namely the library capable of being subjected to on-machine sequencing.
Comparative example 3
This comparative example provides a hybridization capture control experiment.
Tumor wild type FFPE standards and tumor SNV 5% FFPE standards purchased from qian elite gene technology (shenzhen) limited were obtained according to the tumor wild type FFPE standards: tumor SNV 5% FFPE standard 99: 1, 30ng of DNA sample with mutation frequency of 0.05%, using the same probe library as in example 2, i.e., Rui Fa 4 gene panel (without IS1-UMI-EGFR V769_ D770 insASV-1, IS1-UMI-EGFR V769_ D770 insASV-2), using a library construction and hybrid capture kit purchased from Nanjing Kingnshire Biotech Co., Ltd. for library construction according to a standard procedure, including amplification before capture, hybrid capture, amplification after capture, and sequencing.
Comparative example 4
This comparative example provides an amplicon banking control experiment (based on multiplex PCR technology).
Tumor wild type FFPE standards and tumor SNV 5% FFPE standards purchased from qian elite gene technology (shenzhen) limited were obtained according to the tumor wild type FFPE standards: tumor SNV 5% FFPE standard 99: 1, 30ng of DNA sample with mutation frequency of 0.05%, using the same probe library as in example 2, i.e., Rui Fa 4 gene panel (without IS1-UMI-EGFR V769_ D770 insASV-1, IS1-UMI-EGFR V769_ D770 insASV-2), using amplicon banking kit purchased from Osrui Biotechnology, Inc., amplicon banking according to standard procedures, and sequencing.
Sequencing on machine
The library products prepared in example 2, comparative example 3 and comparative example 4 were taken, the concentration of the library products was measured by using Qubit4.0, and 20ng of the library products were taken and machine-sequenced. The model of the sequencing instrument is Illumina HiSeq 4000, the sequencing strategy is PE150, and the sample data volume is1 Gb.
Sequencing data quality control and analysis process
Raw data was processed using fastp software, genome Alignment using BWA software (i.e. Burrows-Wheeler-Alignment Tool, algorithm BWA-MEM), reference genome using GRCh38 (also known as hg38, international universal human reference genome sequence) and labeling using sambamba software (markdup).
The analytical results were as follows:
the sequencing results obtained for the library of example 2 are a collection of 10 index resolved reads (reads) as specified in the following table:
watch 19
As can be seen from the above table, the distribution preference of the reads among the indexes is low (the reads split by each index are close), and the reads which cannot be listed in the index account for seven points in ten-thousandth of the total reads, which indicates that the P7 end sample-labeled indexing amplification system used in example 2 can accurately perform mixed target gene library construction and sequencing on a plurality of samples.
The mutation detection results are given in the following table:
watch 20
As can be seen from the above table, the quality of the sequencing data of the library constructed in example 2 is higher compared with the sequencing results of the other two prior arts, specifically, the proportion of Q30 is higher, and Q30 represents the proportion of reads with the accuracy of 99.9% to the total reads; and the frequency of the target gene mutation detected based on the library of example 2 is closer to the true value, i.e., MAF (mutation Allle frequency) is closer to the preset value of five ten-thousandths. Therefore, the library construction method of example 2 has better performance and takes shorter time when sequencing detection is carried out on specific target genes of complex genomes such as human, the hybrid capture library construction of comparative example 3 needs 72-80 hours, the amplicon library construction of comparative example 4 needs 24-32 hours, the library construction method of example 2 only needs 10 hours, and the steps needed by example 2 are few, and various reagents and consumables are few, so the cost is lower. In conclusion, the library construction method of the embodiment 2 has wider application prospect in clinical detection, medical research and genome science research.
In some embodiments, the invention is applicable to a wide variety of DNA samples that are not amenable to severe degradation and trace amounts of these conventional NGS techniques, and does not require sample length and does not require interruption of long fragments of intact genomic DNA.
In some embodiments, the present invention can integrate the steps separated in the existing library building technology, has a short flow, only needs about 5 hours, and is easy to operate.
In some embodiments, the present invention employs self-contained reagents that can be completely eliminated from the reagent inlet.
In some embodiments, the pooling starting template is in single stranded form, which is suitable for severe degradation and micro-scale samples.
In some embodiments, based on linear amplification and direct molecular labeling on target gene molecules, errors and preferences caused by exponential amplification are reduced, and quantitative detection, ultralow frequency mutation detection and genome structural variation detection, such as gene copy number variation, fusion gene and virus insertion sequences, and the like, can be realized.
The existing NGS library building needs to break a DNA sample to a length range of 200-500bp to adapt to the actual reading length of NGS sequencing (the most common sequencing mode at present is PE150, namely bidirectional sequencing, and each 150bp is long), and complex library fragment length screening is carried out in the library building process (a two-step phrase screening method is generally adopted).
In some embodiments, the initial amount can be as low as 0.1ng, and ultramicro DNA banking is really realized.
In some embodiments, the present invention is automatically compatible with RNA samples after reverse transcription into cDNA, and does not require duplex synthesis, saving material and time, and avoiding a series of errors and preferences associated with random primers in existing duplex synthesis processes.
The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.
<110> organization Name Shenzhen Rui method Biotech limited
Application Project
-------------------
<120> Title, single molecular target gene library building method and kit thereof
<130> AppFileReference : 20I30442
<140> CurrentAppNumber :
<141> CurrentFilingDate : ____-__-__
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
agatcggaag agcacacgtc tgaactccag tcac 34
<212> Type : DNA
<211> Length : 34
SequenceName : 1
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
aagtgactgg agttcagacg tgtgctcttc cgatctnnnn nnn 43
<212> Type : DNA
<211> Length : 43
SequenceName : 2
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
caagcagaag acggcatacg agattgatag gtgactggag ttcagacgtg tgctcttccg 60
<212> Type : DNA
<211> Length : 60
SequenceName : 3
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
caagcagaag acggcatacg agattatacg gtgactggag ttcagacgtg tgctcttccg 60
<212> Type : DNA
<211> Length : 60
SequenceName : 4
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
caagcagaag acggcatacg agatcgatca gtgactggag ttcagacgtg tgctcttccg 60
<212> Type : DNA
<211> Length : 60
SequenceName : 5
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
caagcagaag acggcatacg agatatacac gtgactggag ttcagacgtg tgctcttccg 60
<212> Type : DNA
<211> Length : 60
SequenceName : 6
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
caagcagaag acggcatacg agatatagcg gtgactggag ttcagacgtg tgctcttccg 60
<212> Type : DNA
<211> Length : 60
SequenceName : 7
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
caagcagaag acggcatacg agattgttca gtgactggag ttcagacgtg tgctcttccg 60
<212> Type : DNA
<211> Length : 60
SequenceName : 8
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
caagcagaag acggcatacg agatagatac gtgactggag ttcagacgtg tgctcttccg 60
<212> Type : DNA
<211> Length : 60
SequenceName : 9
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
caagcagaag acggcatacg agattagctg gtgactggag ttcagacgtg tgctcttccg 60
<212> Type : DNA
<211> Length : 60
SequenceName : 10
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
caagcagaag acggcatacg agatgtatgt gtgactggag ttcagacgtg tgctcttccg 60
<212> Type : DNA
<211> Length : 60
SequenceName : 11
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
caagcagaag acggcatacg agatggctca gtgactggag ttcagacgtg tgctcttccg 60
<212> Type : DNA
<211> Length : 60
SequenceName : 12
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
caagcagaag acggcatacg agatcatgct gtgactggag ttcagacgtg tgctcttccg 60
<212> Type : DNA
<211> Length : 60
SequenceName : 13
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
caagcagaag acggcatacg agattcatcg gtgactggag ttcagacgtg tgctcttccg 60
<212> Type : DNA
<211> Length : 60
SequenceName : 14
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
aatgatacgg cgaccaccga gatctacact ctttccctac acgacgctct t 51
<212> Type : DNA
<211> Length : 51
SequenceName : 15
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
acactctttc cctacacgac gctcttccga tctnnnnnnn nntgatttgt agtggagaag 60
ga 62
<212> Type : DNA
<211> Length : 62
SequenceName : 16
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
acactctttc cctacacgac gctcttccga tctnnnnnnn nntggcctgg cttgcttacc 60
tt 62
<212> Type : DNA
<211> Length : 62
SequenceName : 17
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
acactctttc cctacacgac gctcttccga tctnnnnnnn nngcatctgc ctcacctcca 60
cc 62
<212> Type : DNA
<211> Length : 62
SequenceName : 18
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
acactctttc cctacacgac gctcttccga tctnnnnnnn nntccaggag gcagccgaag 60
gg 62
<212> Type : DNA
<211> Length : 62
SequenceName : 19
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
acactctttc cctacacgac gctcttccga tctnnnnnnn nnggaaactg aattcaaaaa 60
ga 62
<212> Type : DNA
<211> Length : 62
SequenceName : 20
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
acactctttc cctacacgac gctcttccga tctnnnnnnn nngaccttac cttatacacc 60
gt 62
<212> Type : DNA
<211> Length : 62
SequenceName : 21
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
acactctttc cctacacgac gctcttccga tctnnnnnnn nngaaataaa tacagatctg 60
tt 62
<212> Type : DNA
<211> Length : 62
SequenceName : 22
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
acactctttc cctacacgac gctcttccga tctnnnnnnn nnaaaaggaa ttccataact 60
tc 62
<212> Type : DNA
<211> Length : 62
SequenceName : 23
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
acactctttc cctacacgac gctcttccga tctnnnnnnn nngacgatac agctaattca 60
ga 62
<212> Type : DNA
<211> Length : 62
SequenceName : 24
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
acactctttc cctacacgac gctcttccga tctnnnnnnn nnacaagttt atattcagtc 60
at 62
<212> Type : DNA
<211> Length : 62
SequenceName : 25
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
acactctttc cctacacgac gctcttccga tctnnnnnnn nntgagagac caatacatga 60
gg 62
<212> Type : DNA
<211> Length : 62
SequenceName : 26
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
acactctttc cctacacgac gctcttccga tctnnnnnnn nntatgtcca acaaacaggt 60
tt 62
<212> Type : DNA
<211> Length : 62
SequenceName : 27
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
acactctttc cctacacgac gctcttccga tctnnnnnnn nnagaaggtg agaaagttaa 60
aa 62
<212> Type : DNA
<211> Length : 62
SequenceName : 28
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
acactctttc cctacacgac gctcttccga tctnnnnnnn nntcacatcg aggatttcct 60
tg 62
<212> Type : DNA
<211> Length : 62
SequenceName : 29
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
acactctttc cctacacgac gctcttccga tctnnnnnnn nnccctccct ccaggaagcc 60
ta 62
<212> Type : DNA
<211> Length : 62
SequenceName : 30
SequenceDescription :
Sequence
--------
<213> OrganismName : Artificial Sequence
<400> PreSequenceString :
acactctttc cctacacgac gctcttccga tctnnnnnnn nnaggcagat gcccagcagg 60
cg 62
<212> Type : DNA
<211> Length : 62
SequenceName : 31
SequenceDescription :
Claims (10)
1. A method for single molecule target gene library construction, comprising:
the extension step comprises the steps of mixing a template molecule and a target probe connected with a first sequencing joint in series, wherein the target probe is combined to a target area of the template molecule in a targeted mode and is subjected to extension reaction to obtain a target probe extension product;
a second sequencing joint connecting step, which comprises adding a second sequencing joint into the reaction system obtained in the extending step, wherein the second sequencing joint contains a complementary paired forward chain and a reverse chain, the 5 ' end of the forward chain of the second sequencing joint can be connected in series to the 3 ' end of the target probe extension product, the 3 ' end of the reverse chain is connected in series with a random nucleotide sequence, reacting to obtain a second sequencing joint connecting product, then melting, removing a single chain in which a random nucleotide sequence is connected in series in the product, and obtaining a single chain product in series with the first sequencing joint;
and a double-strand synthesis step, wherein a first primer and a second primer are added into the single-strand product connected in series with a first sequencing joint, and the reaction is carried out to obtain an amplification product for on-machine sequencing, wherein the first primer contains a sequence which is complementarily paired with the first sequencing joint, and the second primer contains a sequence which is complementarily paired with the second sequencing joint.
2. The single molecule target gene library construction method of claim 1, wherein the template molecule is single stranded DNA;
and/or the template molecule is selected from at least one of single-stranded DNA obtained after the melting treatment of the double-stranded DNA and cDNA obtained by reverse transcription of RNA;
and/or the template molecule is derived from at least one of bisulfite treated DNA, DNA from Formalin Fixed and Paraffin Embedded (FFPE) tissue, DNA extracted from forensic samples, free DNA contained in body fluids (cfdna), DNA extracted from ancient fossils or biological remains from archaeological excavation;
and/or the initial amount of the template molecule is from 0.1 to 1000 nanograms;
and/or, the extension product of the target probe is single-stranded DNA.
3. The method for single-molecule target gene library construction according to claim 1, wherein the 5' end of the forward strand of the second sequencing adaptor is modified with a phosphate group;
and/or a molecular tag is connected in series between the first sequencing joint and the target probe;
and/or the molecular label is 4-19nt in length.
4. The method for single molecule target gene library construction according to claim 1, wherein the random nucleotide sequence concatenated at the 3' end of the reverse strand of the second sequencing adaptor is 5-15nt in length.
5. The method of single molecule target gene banking according to claim 1 wherein the first primer contains a sequence that is complementarily pairable with the first sequencing adaptor and the second primer contains a sequence that is complementarily pairable with the second sequencing adaptor;
and/or, the first primer comprises an inner adaptor sequence, an outer adaptor sequence, the 5 'end of the inner adaptor sequence being linked in series to the 3' end of the outer adaptor sequence, the inner adaptor sequence being reverse complementary pairable to the first sequencing adaptor;
and/or, the first primer contains or does not contain a first sample tag;
and/or, when the first primer comprises a first sample tag, the first sample tag is located between the inner adaptor sequence and the outer adaptor sequence;
and/or, the first sample tag is 4-15nt in length;
and/or, the second primer comprises an inner adaptor sequence, an outer adaptor sequence, the 5 'end of the inner adaptor sequence being linked in series to the 3' end of the outer adaptor sequence, the inner adaptor sequence being reverse complementary pairable to the second sequencing adaptor;
and/or, the second primer contains or does not contain a second sample tag;
and/or, when the second primer contains a second sample tag, the second sample tag is positioned between the inner adaptor sequence and the outer adaptor sequence;
and/or the length of the second sample label is 4-15 nt;
and/or, in the extension step, the amplification cycle number of the extension reaction is more than or equal to 1;
and/or, in the extension step, the number of amplification cycles of the extension reaction is 5-500 cycles;
and/or, in the extension step, each cycle of the reaction is as follows: 94-98 ℃ for 10-60 seconds; 55-65 ℃ for 10-60 seconds; 68-72 ℃ for 10-60 seconds.
6. The method for single-molecule target gene library construction according to claim 1, wherein in the second sequencing linker ligation step, the ligation reaction is carried out at 22-40 ℃ for 0.5-2 hours;
and/or, in the second sequencing linker ligation step, the ligase used is T4 DNA ligase.
7. The single molecule target gene library construction method according to claim 1, wherein in the second sequencing linker ligation step, the melting process is a denaturation process;
and/or the denaturation treatment is thermal denaturation treatment;
and/or, the heat denaturation treatment is specifically heating the target molecule to at least 80 ℃ for at least 1 min;
and/or the first sequencing linker is selected from any one of a P5 terminal sequencing linker of an Illumina sequencing platform and a P2 terminal sequencing linker of an MGI sequencing platform;
and/or the second sequencing linker is selected from any one of a P7 terminal sequencing linker of an Illumina sequencing platform and a P1 terminal sequencing linker of an MGI sequencing platform.
8. The library constructed by the method of any one of claims 1 to 7.
9. A kit comprising a first sequencing adaptor and a second sequencing adaptor, wherein the first sequencing adaptor is connected in series with a target probe, the target probe can bind to a target region of a template molecule and extend for reaction, the second sequencing adaptor comprises a complementary pair of a forward strand and a reverse strand, the 5 ' end of the forward strand of the second sequencing adaptor can be connected in series to the 3 ' end of a target probe extension product, and the 3 ' end of the reverse strand is connected in series with a random nucleotide sequence.
10. The kit of claim 9, wherein the template molecule is single-stranded DNA;
and/or the template molecule is at least one of single-stranded DNA obtained after the melting treatment of the double-stranded DNA and cDNA obtained by reverse transcription of RNA;
and/or, the template molecule is selected from bisulfite treated DNA, various types of severely degraded DNA;
and/or, the types of severely degraded DNA include Formalin Fixed and Paraffin Embedded (FFPE) tissue DNA, forensic sample extracted DNA, free DNA contained in body fluids (cfdna);
and/or the initial amount of the template molecule is from 0.1 to 1000 nanograms;
and/or, the target probe extension product is single-stranded DNA;
and/or the 5' end of the forward strand of the second sequencing linker is modified with a phosphate group;
and/or a molecular tag is connected in series between the first sequencing joint and the target probe;
and/or the molecular label is 4-19nt in length;
and/or the length of the random nucleotide sequence connected in series with the 3' end of the reverse strand of the second sequencing linker is 5-15 nt;
and/or, the kit further comprises a first primer comprising a sequence that is complementarily pairable with the first sequencing adaptor, a second primer comprising a sequence that is complementarily pairable with the second sequencing adaptor;
and/or, the first primer comprises an inner adaptor sequence, an outer adaptor sequence, the 5 'end of the inner adaptor sequence being linked in series to the 3' end of the outer adaptor sequence, the inner adaptor sequence being reverse complementary pairable to the first sequencing adaptor;
and/or, the first primer contains or does not contain a first sample tag;
and/or, when the first primer comprises a first sample tag, the first sample tag is located between the inner adaptor sequence and the outer adaptor sequence;
and/or, the first sample tag is 4-15nt in length;
and/or, the second primer comprises an inner adaptor sequence, an outer adaptor sequence, the 5 'end of the inner adaptor sequence being linked in series to the 3' end of the outer adaptor sequence, the inner adaptor sequence being reverse complementary pairable to the second sequencing adaptor;
and/or, the second primer contains or does not contain a second sample tag;
and/or, when the second primer contains a second sample tag, the second sample tag is positioned between the inner adaptor sequence and the outer adaptor sequence;
and/or the length of the second sample label is 4-15 nt;
and/or the first sequencing linker is selected from any one of a P5 terminal sequencing linker of an Illumina sequencing platform and a P2 terminal sequencing linker of an MGI sequencing platform;
and/or the second sequencing linker is selected from any one of a P7 terminal sequencing linker of an Illumina sequencing platform and a P1 terminal sequencing linker of an MGI sequencing platform.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011533112.5A CN112575388A (en) | 2020-12-22 | 2020-12-22 | Single-molecule target gene library building method and kit thereof |
CN202111572277.8A CN114214734A (en) | 2020-12-22 | 2021-12-21 | A single-molecule target gene library building method and kit thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011533112.5A CN112575388A (en) | 2020-12-22 | 2020-12-22 | Single-molecule target gene library building method and kit thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112575388A true CN112575388A (en) | 2021-03-30 |
Family
ID=75139421
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011533112.5A Pending CN112575388A (en) | 2020-12-22 | 2020-12-22 | Single-molecule target gene library building method and kit thereof |
CN202111572277.8A Pending CN114214734A (en) | 2020-12-22 | 2021-12-21 | A single-molecule target gene library building method and kit thereof |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111572277.8A Pending CN114214734A (en) | 2020-12-22 | 2021-12-21 | A single-molecule target gene library building method and kit thereof |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN112575388A (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024138672A1 (en) * | 2022-12-30 | 2024-07-04 | 深圳华大生命科学研究院 | Improved nucleic acid capture method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101967476A (en) * | 2010-09-21 | 2011-02-09 | 深圳华大基因科技有限公司 | Joint connection-based deoxyribonucleic acid (DNA) polymerase chain reaction (PCR)-free tag library construction method |
WO2016172265A1 (en) * | 2015-04-20 | 2016-10-27 | Neogenomics Laboratories, Inc. | Method to increase sensitivity of next generation sequencing |
CN108504649A (en) * | 2017-02-24 | 2018-09-07 | 上海基致生物医药科技有限公司 | Banking process, kit and detection method is sequenced in coding bis- generations of PCR |
CN110734967A (en) * | 2018-07-19 | 2020-01-31 | 深圳华大智造科技有限公司 | adaptor composition and application thereof |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10087481B2 (en) * | 2013-03-19 | 2018-10-02 | New England Biolabs, Inc. | Enrichment of target sequences |
CN107236729A (en) * | 2017-07-04 | 2017-10-10 | 上海阅尔基因技术有限公司 | The method and kit of a kind of rapid build target nucleic acid sequencing library that enrichment is captured based on probe |
CN109234356B (en) * | 2018-09-18 | 2021-10-08 | 南京迪康金诺生物技术有限公司 | Method for constructing hybridization capture sequencing library and application |
CN110656156A (en) * | 2019-10-14 | 2020-01-07 | 湖南大地同年生物科技有限公司 | Ultralow frequency mutation nucleic acid fragment detection method, library construction method, primer design method and reagent |
-
2020
- 2020-12-22 CN CN202011533112.5A patent/CN112575388A/en active Pending
-
2021
- 2021-12-21 CN CN202111572277.8A patent/CN114214734A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101967476A (en) * | 2010-09-21 | 2011-02-09 | 深圳华大基因科技有限公司 | Joint connection-based deoxyribonucleic acid (DNA) polymerase chain reaction (PCR)-free tag library construction method |
WO2016172265A1 (en) * | 2015-04-20 | 2016-10-27 | Neogenomics Laboratories, Inc. | Method to increase sensitivity of next generation sequencing |
CN108504649A (en) * | 2017-02-24 | 2018-09-07 | 上海基致生物医药科技有限公司 | Banking process, kit and detection method is sequenced in coding bis- generations of PCR |
CN110734967A (en) * | 2018-07-19 | 2020-01-31 | 深圳华大智造科技有限公司 | adaptor composition and application thereof |
Non-Patent Citations (1)
Title |
---|
张向阳: "《医学分子生物学》", 28 February 2018 * |
Also Published As
Publication number | Publication date |
---|---|
CN114214734A (en) | 2022-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110036117B (en) | Method for increasing throughput of single molecule sequencing by multiple short DNA fragments | |
CN105121664B (en) | Mixture and its it is compositions related in nucleic acid sequencing approach | |
CN113005121B (en) | Linker elements, kits and uses related thereto | |
CN107604046B (en) | Next-generation sequencing method for bimolecular self-checking library preparation and hybrid capture for ultra-low frequency mutation detection of trace DNA | |
CN109593757B (en) | Probe and method for enriching target region by using same and applicable to high-throughput sequencing | |
TW201321518A (en) | Method of micro-scale nucleic acid library construction and application thereof | |
CN111471754B (en) | Universal high-throughput sequencing joint and application thereof | |
JP7688972B2 (en) | Improved methods and kits for generating DNA libraries for massively parallel sequencing - Patents.com | |
CN112410331A (en) | Linker with molecular label and sample label and single-chain library building method thereof | |
CN114277096B (en) | Method and kit for identifying thalassemia alpha anti4.2 heterozygotes and HK alpha heterozygotes | |
WO2017004083A1 (en) | Methods of producing nucleic acid libraries and compositions and kits for practicing same | |
US20140336058A1 (en) | Method and kit for characterizing rna in a composition | |
CN112680796A (en) | Target gene enrichment and library construction method | |
CN112941147B (en) | High-fidelity target gene library construction method and kit thereof | |
CN112575388A (en) | Single-molecule target gene library building method and kit thereof | |
CN114207229A (en) | Flexible and high throughput sequencing of target genomic regions | |
US20210040540A1 (en) | Parallel liquid-phase hybrid capture method for simultaneously capturing sense and antisense double strands of genomic target region | |
CN114085895B (en) | Detection primer for rapidly detecting MSI and kit thereof | |
CN117230169B (en) | Adaptor for sequencing long fragment telomere sequence, pre-library and construction method thereof | |
JP7141165B1 (en) | RNA probes and uses thereof for mutation profiling | |
CN117255857A (en) | Joint, joint connection reagent, kit and library construction method | |
JP2024543250A (en) | Target enrichment and quantification using isothermal linear amplified probes | |
EP4392577A1 (en) | Optimised set of oligonucleotides for bulk rna barcoding and sequencing | |
CN112646809A (en) | Nucleic acid sequence, method and kit for detecting enzyme end repair capacity | |
WO2024132094A1 (en) | Retrieval of sequence-verified nucleic acid molecules |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210330 |
|
WD01 | Invention patent application deemed withdrawn after publication |