CN117711488B - Gene haplotype detection method based on long-reading long-sequencing and application thereof - Google Patents
Gene haplotype detection method based on long-reading long-sequencing and application thereof Download PDFInfo
- Publication number
- CN117711488B CN117711488B CN202311620961.8A CN202311620961A CN117711488B CN 117711488 B CN117711488 B CN 117711488B CN 202311620961 A CN202311620961 A CN 202311620961A CN 117711488 B CN117711488 B CN 117711488B
- Authority
- CN
- China
- Prior art keywords
- long
- sequencing
- software
- sample
- genotype
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012163 sequencing technique Methods 0.000 title claims abstract description 49
- 238000001514 detection method Methods 0.000 title claims abstract description 44
- 102000054766 genetic haplotypes Human genes 0.000 title claims abstract description 39
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 21
- 238000000034 method Methods 0.000 claims abstract description 38
- 230000035772 mutation Effects 0.000 claims abstract description 35
- 238000012937 correction Methods 0.000 claims abstract description 26
- 108010001237 Cytochrome P-450 CYP2D6 Proteins 0.000 claims abstract description 22
- 238000007672 fourth generation sequencing Methods 0.000 claims abstract description 11
- 238000005191 phase separation Methods 0.000 claims description 17
- 108700028369 Alleles Proteins 0.000 claims description 14
- 238000012408 PCR amplification Methods 0.000 claims description 12
- 230000003321 amplification Effects 0.000 claims description 12
- 230000000694 effects Effects 0.000 claims description 12
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 11
- 239000012634 fragment Substances 0.000 claims description 7
- 238000001914 filtration Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 5
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 4
- 238000002156 mixing Methods 0.000 claims description 2
- 101150010738 CYP2D6 gene Proteins 0.000 abstract description 20
- 102100021704 Cytochrome P450 2D6 Human genes 0.000 abstract description 19
- 238000004458 analytical method Methods 0.000 abstract description 13
- 238000003205 genotyping method Methods 0.000 abstract description 12
- 238000011176 pooling Methods 0.000 abstract description 3
- 108020004414 DNA Proteins 0.000 description 13
- 239000000047 product Substances 0.000 description 11
- 238000012360 testing method Methods 0.000 description 8
- 108090000790 Enzymes Proteins 0.000 description 7
- 102000004190 Enzymes Human genes 0.000 description 7
- 238000012795 verification Methods 0.000 description 7
- 108091093088 Amplicon Proteins 0.000 description 6
- INDBQLZJXZLFIT-UHFFFAOYSA-N primaquine Chemical compound N1=CC=CC2=CC(OC)=CC(NC(C)CCCN)=C21 INDBQLZJXZLFIT-UHFFFAOYSA-N 0.000 description 5
- 229960005179 primaquine Drugs 0.000 description 5
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 5
- 239000011324 bead Substances 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 4
- 238000011529 RT qPCR Methods 0.000 description 3
- 230000001925 catabolic effect Effects 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 238000000746 purification Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 239000006228 supernatant Substances 0.000 description 3
- 238000010200 validation analysis Methods 0.000 description 3
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 2
- 241000223810 Plasmodium vivax Species 0.000 description 2
- 238000000137 annealing Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 239000003153 chemical reaction reagent Substances 0.000 description 2
- 238000004925 denaturation Methods 0.000 description 2
- 230000036425 denaturation Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 108020004707 nucleic acids Proteins 0.000 description 2
- 150000007523 nucleic acids Chemical class 0.000 description 2
- 102000039446 nucleic acids Human genes 0.000 description 2
- 210000005259 peripheral blood Anatomy 0.000 description 2
- 239000011886 peripheral blood Substances 0.000 description 2
- 238000012257 pre-denaturation Methods 0.000 description 2
- 238000003908 quality control method Methods 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 238000005406 washing Methods 0.000 description 2
- WREVVZMUNPAPOV-UHFFFAOYSA-N 8-aminoquinoline Chemical compound C1=CN=C2C(N)=CC=CC2=C1 WREVVZMUNPAPOV-UHFFFAOYSA-N 0.000 description 1
- 101150072890 CYP2D7 gene Proteins 0.000 description 1
- 108091008109 Pseudogenes Proteins 0.000 description 1
- 102000057361 Pseudogenes Human genes 0.000 description 1
- 238000000246 agarose gel electrophoresis Methods 0.000 description 1
- 230000001195 anabolic effect Effects 0.000 description 1
- 239000003430 antimalarial agent Substances 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000036267 drug metabolism Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 201000004792 malaria Diseases 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 239000002207 metabolite Substances 0.000 description 1
- 238000002493 microarray Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036438 mutation frequency Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002974 pharmacogenomic effect Effects 0.000 description 1
- 229940002612 prodrug Drugs 0.000 description 1
- 239000000651 prodrug Substances 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 239000007858 starting material Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/6858—Allele-specific amplification
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/50—Mutagenesis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/172—Haplotypes
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Analytical Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Wood Science & Technology (AREA)
- Genetics & Genomics (AREA)
- Zoology (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- General Engineering & Computer Science (AREA)
- Biochemistry (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Pathology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a gene haplotype detection method based on long-reading long-sequencing and application thereof. Amplifying a CYP2D6 gene sequence by a primer, amplifying the CYP2D6 gene sequence again by using a specific tag index, marking different samples, carrying out Nanopore sequencing analysis on sample libraries pooling with different tags, carrying out sample data splitting by using the specific tag index as a mark and setting the index fault tolerance number as 1 on sequencing machine-down data, effectively improving the effective rate of data splitting, deducing the linkage relation between point mutations by long-reading long sequencing results, predicting haplotypes according to the linkage relation, and finally realizing accurate parting haplotypes. Compared with the existing method, the method provided by the invention has the advantages that the correction of the point mutation in the Nanopore sequencing is realized, so that the F1 value is greatly improved, and the more accurate and effective CYP2D6 genotyping is realized.
Description
Technical Field
The invention relates to the technical field of biology, in particular to a gene haplotype detection method based on long-reading long-sequencing and application thereof.
Background
It has been found that about 25% of the drug is metabolized by a single cytochrome P450-2D6 (CYP 2D 6) enzyme that is highly expressed in the liver. Polymorphism of the CYP2D6 gene allows different genotypes to exhibit different enzyme activities. The enzyme activities of P450-2D6 can be divided into four classes: PM (catabolic), IM (anabolic), EM (normal catabolic), and UM (ultra-fast catabolic). There is a great difference in the case of different enzyme activities on drug metabolism, such as the 8-aminoquinoline antimalarial drug Primaquine (PQ), a prodrug that requires CYP2D6 metabolism to produce activity. Primaquine is the only effective therapeutic to prevent plasmodium vivax recurrence, but for patients with PM or IM phenotypes they cannot metabolize primaquine to its active metabolite, and patients with such phenotypes are also at higher risk of plasmodium vivax recurrence after cure. Therefore, in controlling malaria with PQ, it is important to know the frequency of PM or IM phenotypes in the target population.
Current methods for detecting CYP2D6 genotypes, such as SNP-microarrays, qPCR, or short read long sequencing methods, are relatively inexpensive. However, these methods have limitations, such as that they can only identify the common CYP2D6 genotype by detecting known mutation types, but have a large limitation in detecting new mutations. In the short-read long sequencing method, although some new variants can be found, since the CYP2D6 and CYP2D7 gene sequences have high similarity, mismatches of reads are easily caused in actual detection, and error variants are detected. Furthermore, these methods are based on the inference of detected variations and known allele frequencies, rather than directly obtaining the sequence of individual alleles. Thus, the presence of rare or new alleles will further confound the detection results of these methods.
Long-read long sequencing can solve the challenges faced by the current short-read long sequencing technology to a certain extent, but due to the high sequencing error rate (5% -15%) of long-read long sequencing, the problems that false mutation can not be detected and true mutation can not be detected exist in mutation detection, and currently used mainstream mutation detection software still can generate more false positive mutation and missing false negative sites, which can greatly influence CYP2D6 genotyping. Furthermore, in the study of genomic variations using long-read long sequencing data, detection of both SNP and InDel is a fundamental detection project. At present, although a plurality of different algorithms are available for SNP and InDel analysis in second generation sequencing data, the methods are developed for the second generation sequencing data, and therefore cannot be well operated on long-reading long-sequencing data with high sequencing error rate.
In the prior art, a method for analyzing the raw information by using minimap & lt2+ & gt nanopolish software combination is mainly used at present, and compared with other comparison software and mutation detection software combination analysis PPV and Sensitivity are optimal, wherein the PPV is 79.12%, the Sensitivity is 96.43%, and the F1 value is 0.8692. It still has more false positive variation (lower PPV value) which is mainly due to higher long read long sequencing error rate. This false positive variation can deviate from the next haplotype prediction and thus affect genotyping. Therefore, there is a need to provide an efficient long-read long sequencing method capable of high accuracy for CYP2D6 genotyping.
Disclosure of Invention
The present invention aims to solve at least one of the above technical problems in the prior art. Therefore, the invention aims to provide a gene haplotype detection method based on long-reading long sequencing and an application thereof. According to the method, after the CYP2D6 gene sequence is amplified through the primer, the CYP2D6 gene sequence is amplified again through the specific tag index, different samples are marked, then the sample library pooling with different tags is subjected to Nanopore sequencing analysis, the index fault tolerance number is set to be 1 through taking the specific tag index as the mark, sample data are split, so that the effective rate of data splitting is effectively improved, then the linkage relation among point mutations is deduced through long-reading long sequencing results, haplotypes are predicted according to the linkage relation, and finally accurate typing haplotypes are realized. Compared with the existing method, the method provided by the invention has the advantages that the correction of the point mutation in the Nanopore sequencing is realized, so that the F1 value is greatly improved, and the more accurate and effective CYP2D6 genotyping is realized.
In a first aspect of the present invention, there is provided a method for detecting a gene haplotype, comprising the steps of:
(1) Carrying out PCR amplification on a sample to be detected by using a specific primer to obtain a target fragment, and then carrying out PCR amplification on the target fragment again by using a label primer to obtain amplified products with different labels;
(2) Equivalent mixing is carried out on amplification products with different labels from different samples to be detected, a Nanopore sequencing library is constructed, long-reading long-sequencing is carried out on the Nanopore sequencing library, sequencing results are compared with human reference genome to obtain sequenced BAM files, mutation detection is carried out, VCF files are obtained, and a Bayesian correction model is used for correcting the VCF files to obtain corrected VCF files;
(3) And (3) phase-splitting the corrected VCF file and the ordered BAM file by using a phase command, then executing haplotag command according to a phase-splitting result, marking the data, and judging the gene haplotype according to the mark.
In some embodiments of the invention, the human reference genome is a CYP2D6 gene reference sequence.
In some embodiments of the invention, the specific primer is set forth in SEQ ID NO: 1-2.
In the present invention, the specific primer includes a binding portion that specifically targets a target sequence and a common sequence portion. Wherein the common sequence part is used for subsequent tag ligation.
In some embodiments of the invention, the public sequence is linked to the 5' end of the binding portion of the specific targeting target sequence.
In some embodiments of the invention, the tag primer is set forth in SEQ ID NO:3 to 206.
In the present invention, the tag primer includes a tag portion and a common sequence portion.
In some embodiments of the invention, the public sequence is attached to the 3' end of the tag moiety.
In some embodiments of the invention, the bayesian correction model is:
Wherein, gi represents the genotype of the target site, and G0, G1 and G2 represent the wild, heterozygous mutation and homozygous mutation, respectively;
a represents the frequency AF of the variant allele;
p (A|G0), P (A|G1) and P (A|G2) are obtained by calculating the sample mean value and the sample standard deviation of the site through the prior probabilities P (G0), P (G1) and P (G2) of the corresponding genotypes respectively and then fitting by normal distribution.
According to the invention, based on the Bayesian formula modeling added in the biological information analysis method, the point mutation (including SNP and small InDel) detected by long-reading long-sequencing can be corrected, and then the linkage relation between the point mutations is deduced through the long-reading long-sequencing result, and finally accurate genotyping haplotype is realized.
In some embodiments of the invention, the gene haplotype detection method is used for CYP2D6 genotyping.
For CYP2D6 genotyping, the sequencing of the long PCR amplicon can not only clearly detect variation without being interfered by homologous pseudogenes, but also means that the mutation analysis can be directly carried out on long reads, thereby effectively reducing the complexity of steps and simultaneously ensuring high accuracy.
In some embodiments of the present invention, when correcting using a bayesian correction model, the following formulas are used simultaneously for site result verification:
wherein A represents the frequency AF of the variant allele;
μ represents the mean of the site in the wild-type sample;
Sigma represents the standard deviation of the site in the wild-type sample;
If the Z value is less than 1.96, the locus genotype of the sample to be tested is identical to the wild type;
and (3) if the Z value is more than or equal to 1.96, the locus genotype of the sample to be detected is the locus genotype in the VCF file obtained in the step (2).
In some embodiments of the invention, in step (3),
If phase separation can be carried out, splitting the VCF file after phase separation into two haploid VCF files by using a Perl script, then carrying out genotype detection on the two haploid VCF files by using Stargezar software, and finally combining haplotypes of the two haploids as a final genotype result;
if the phase separation can not be carried out, carrying out genotype detection on the corrected VCF file by Stargezar software to obtain a final genotype result
In some embodiments of the invention, the method further comprises performing data processing after long read long sequencing, comprising:
Extracting DNA sequence information by Guppy software, filtering out q <8 parts, then using a specific label index as a mark, using a Python script to set the index fault tolerance number as 1 pair of filtered data splitting, removing joints, filtering out q <9 parts, using Minimap2 comparison software to perform sequence comparison to obtain SAM comparison files, then using Samtools software to process to obtain ordered BAM files, using mplieup and call commands of Bcftools software, and using multiallelic-caller algorithm to perform mutation detection on the ordered BAM files to obtain VCF files.
In some embodiments of the invention, poreChop software is used to remove the linker.
In some embodiments of the invention, nanoFilt software is used to filter out portions of q < 9.
In some embodiments of the invention, the Minimap alignment software uses map-ont mode.
In some embodiments of the present invention, view, sort, and index commands are used sequentially when processing using Samtools software.
In the present invention, a flowchart of the gene haplotype detection method is shown in FIG. 1.
In a second aspect, the invention provides the use of the method for detecting a gene haplotype according to the first aspect of the invention in CYP2D6 enzyme activity typing.
In the present invention, after the genotype of CYP2D6 in the sample to be measured is determined by the gene haplotype detection method according to the first aspect of the invention, CYP2D6 enzyme activity typing can be performed according to the genotype corresponding to CYP2D6 enzyme activity in the art.
The beneficial effects of the invention are as follows:
1. The gene haplotype detection method effectively solves the problem of low accuracy of CYP2D6 genotyping in the prior art, can more accurately perform CYP2D6 genotyping by combining simple PCR amplification with long-reading long-sequencing, and has remarkable improvement compared with the accuracy and the like in the prior art.
2. The gene haplotype detection method introduces a Bayesian correction model to correct data, and simultaneously discovers that minimap2+ bcftools has a better detection effect than minimap2+ nanopolish in the prior art, and can basically realize the 100% detection effect in practical verification.
Drawings
FIG. 1 is a flow chart of a method for detecting a gene haplotype according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples. The starting materials, reagents or apparatus used in the examples and comparative examples were either commercially available from conventional sources or may be obtained by prior art methods unless specifically indicated. Unless otherwise indicated, assays or testing methods are routine in the art.
EXAMPLE 1 design of specific primers for CYP2D6 Gene
In the present invention, a pair of CYP2D6 gene-specific primers was designed based on the CYP2D6 gene sequence (reference sequence number NG_ 008376.4). The primer pair is continuously tested and optimized on the basis of taking human genome DNA as a target detection sample, and can finally realize specific amplification of CYP2D6 complete gene sequence through optimization of an amplification system and an amplification program.
The obtained CYP2D6 gene-specific primers are shown in Table 1.
TABLE 1 CYP2D6 Gene primer information Table
Wherein, SEQ ID NO: 1-2, the 5' -end of the CYP2D6 gene primer is bolded and underlined as a common sequence for ligation of the specific tag index in a subsequent step.
Example 2CYP2D6 Gene haplotype detection method
In this example, a CYP2D6 gene haplotype detection method (two-step PCR) using the above primer pair as a base is exemplified. The method comprises the steps of firstly carrying out full-length amplification on CYP2D6 gene sequences in human genome to obtain a required target fragment, then modifying the target fragment to connect with corresponding specific tag index, and then carrying out long-reading long-sequencing on the target fragment after pooling based on sample libraries of different tags to obtain a CYP2D6 gene haplotype detection result, thereby obtaining parting information.
The method comprises the following specific steps:
(1) Acquisition of genomic DNA:
In this example, the genomic DNA is derived from a peripheral blood sample (of course, it is also possible to directly detect the genomic DNA based on the existing genomic DNA).
Taking 0.1-0.2 mL of peripheral blood sample, extracting genome DNA by using a conventional nucleic acid extraction method or product in the field (such as a nucleic acid extraction or purification reagent manufactured by Dongguan Bo Aoshi gene technology Co., ltd., product number: S10040), carrying out initial evaluation on purity detection and concentration of the extracted genome DNA by using Nanodrop2000, and carrying out integrity verification on the extracted genome DNA by agarose gel electrophoresis. Genomic DNA that passed the above test was taken for subsequent detection.
(2) And (3) PCR amplification:
PCR amplification was performed using the extracted genomic DNA as a template, and CYP2D6 gene-specific primers to which the public sequences were ligated in the above examples. The reaction system (25. Mu.L) is shown in Table 2.
TABLE 2 PCR amplification System
| Component (A) | Content of |
| Sample DNA to be tested (genomic DNA) | 10ng |
| LA Taq enzyme | 0.3μL |
| 10 Xamplification buffer | 2.5μL |
| dNTP | 4μL |
| 10 Mu M CYP2D6 gene upstream and downstream specific primer | 1 Mu L each |
| Enzyme-free water | Supplement to 25. Mu.L |
The amplification reaction conditions were: pre-denaturation at 94℃for 1min; denaturation at 98℃for 10s, annealing at 65℃for 60s, extension at 72℃for 6min,20 cycles; extending at 72℃for 10min. Obtaining an amplification product.
The amplified product is purified by using a magnetic bead purification method, and the specific steps are as follows: taking 25 mu L of the PCR amplification product obtained after the PCR amplification, adding 12.5 mu L of magnetic beads into a centrifuge tube, standing for 5min, placing the mixture on a magnetic frame for treatment, removing supernatant magnetic beads, washing twice with 75% absolute ethyl alcohol, and then adding 17 mu L of anhydrous enzyme water (nucleic-FREE WATER) for resuspension. And sucking the supernatant into a new centrifuge tube to obtain a purified PCR amplification product.
The 3' end of the specific tag index was ligated to a public sequence (submitted to the trade company, ind. Strapdesk).
The obtained specific tag index sequence information linked to the common sequence is shown in Table 3.
TABLE 3 specific tag index sequence information
The purified PCR amplification product (as a template) was amplified with the above-mentioned specific tag index (as a primer) to which the common sequence was attached. The reaction system (25. Mu.L) is shown in Table 4.
TABLE 4PCR amplification System
| Component (A) | Content of |
| Sample DNA to be tested (purified PCR amplified product) | 16.2μL |
| LA Taq enzyme | 0.3μL |
| 10 Xamplification buffer | 2.5μL |
| dNTP | 4μL |
| 10 Mu M upstream and downstream specific tag index linked to common sequence | 1 Mu L each |
| Enzyme-free water | Supplement to 25. Mu.L |
The amplification reaction conditions were: pre-denaturation at 94℃for 1min; denaturation at 98℃for 10s, annealing at 65℃for 60s, extension at 72℃for 6min,15 cycles; extending at 72℃for 10min. A second amplification product (i.e., a library of amplicons) is obtained.
The above magnetic bead purification steps were repeated to purify the amplicon library. After washing twice with 75% absolute ethanol, 50 μl of anhydrous water was added for resuspension, the supernatant was aspirated into a new centrifuge tube, a purified amplicon library was obtained, and quantitated for Qubit.
(3) Long read long sequencing and result analysis:
purified amplicon libraries derived from different test samples were mixed in equal proportions for long-read long sequencing.
The initial input of sample sequencing was estimated based on the amplicon length of the CYP2D6 gene, the Nanopore library-building sequencing kit (EXP-NBD 104 and SQK-LSK 110) instructions and the amount of data required for the belief analysis. Then long-reading long sequencing was performed according to the instruction of the Nanopore library-building sequencing kit (EXP-NBD 104, SQK-LSK 110).
Biological information analysis is carried out on the long-reading long-sequencing data after the MinION is started, and the specific analysis steps are as follows:
Extracting the DNA sequence in the Fast5 storage nanopore signal file generated by sequencing by utilizing Guppy software (v.6.4.6) through Minion, filtering low-quality sequences (q < 8) in the DNA sequence, and obtaining successful reads to generate a final Fastq sequence file. And obtaining specific tag index information according to sequencing, setting the fault tolerance number of the specific tag index to be 1 by using a Python script, and splitting Fastq sequence files to obtain Fastq sequence files of samples corresponding to different specific tag indexes. And then quality control is carried out on Fastq sequence files of samples corresponding to different specificity tag index by utilizing NanoPlot software (v1.40.2), and quality information is counted. The Fastq sequence files of the corresponding samples of the different specificity tag index were subjected to a deblocking process using PoreChop software (v0.2.4) and low quality sequences (q < 9) were filtered using NanoFilt software (v2.8.0). And (3) comparing the filtered Fastq sequence file with a CYP2D6 reference sequence (NG_ 008376.4) by utilizing map-ont mode in Minimap2 comparison software (v 2.17-r 941) to obtain a SAM comparison file. And processing the SAM comparison file sequentially by utilizing view, sort, index commands in Samtools software (v 1.2) to obtain the ordered BAM file. And performing quality control on the ordered BAM files according to the specific amplification region by using Bamdst software (v1.0.9), and counting information such as coverage. And using mplieup and call commands of Bcftools software (v 1.12) to perform mutation detection on the sequenced BAM files by using a parameter-m (multiallelic-caller algorithm) to obtain VCF files. Correcting point mutations (including SNP and small InDel) in the VCF file through a correction model to obtain the corrected VCF file. Genotype prediction is carried out on the basis of the corrected VCF file by a haplotype detection method, and the copy number is deduced by combining with AF frequency, so that the final genotype is obtained.
The correction model is constructed based on a Bayesian correction model. Considering that the mutation rate of the CYP2D6 gene is relatively high, all possible mutation sites cannot be trained in a model, so in the method, the construction of the model is focused on 396 key mutation sites recorded by PharmVar for allele identification, and therefore, the sites directly influence the judgment of alleles.
And (3) taking point mutations (including SNP and small InDel) detected by the Illumina second generation sequencing data as a standard, and establishing a correction model through mutation frequency of the base positions detected by the long-reading long-sequencing data. For sites with more mutation numbers in the Illumina sequencing result, constructing a genotype frequency Bayesian model of the site, and when a certain AF is detected at a certain site, calculating the probability that the site is of a certain genotype according to the following formula:
Wherein, in the formula:
Gi represents a genotype at a site, each site having three genotypes G0, G1 and G2, representing wild-type, heterozygous and homozygous mutations, respectively. A represents the frequency AF of the variant allele. P (G0), P (G1) and P (G2) are the prior probabilities of the crowd frequency of the corresponding genotypes, and are derived from the east Asia crowd frequency of the gnomAD locus genotype in the database (v2.1.1), and P (A|G0), P (A|G1) and P (A|G2) are obtained by calculating the locus sample mean value and the sample standard deviation respectively and then fitting by normal distribution.
For the sites with fewer mutations in the Illumina sequencing result, such as the wild type result, a site FP filtering model is constructed to process the sites, and the specific formula is as follows:
Wherein, in the formula:
A represents the frequency AF of the variant allele. Mu represents the mean of the sites in the wild-type sample. Sigma represents the standard deviation of the site in the wild-type sample.
When a certain AF is detected at a certain site, the Z value of the site can be calculated, and when the Z value is <1.96, the genotype of the site is considered to be the wild type.
In the model construction stage, because part of sites in the training set for construction have low frequency in the east Asian population, the FP filter model is firstly carried out to determine the site genotype, and then the Bayesian model is further used for correction. However, in actual detection (e.g., using a test set or actual detection sample), the bayesian model and FP filter model are performed synchronously. That is, when the Z value is 1.96 or more, the FP filter model and the bayesian model corrected result are output. When the Z value is <1.96, the corrected wild-type site genotype is output, giving a wild-type (negative) result through the subsequent steps.
Of course, if in actual detection, when the Z value is <1.96 and the locus still allows correction using bayesian models, bayesian model correction is still necessary for locus genotypes output by mutation detection software to ensure that false positive and false negative results are filtered out. However, if the use of the bayesian model is not allowed, no bayesian model correction is performed.
To test the effectiveness of the correction model, a separate sample was used for verification. The results are shown in Table 5.
Table 5 bayesian correction model performance verification
The meaning and corresponding algorithm formula of each index parameter in table 5 are shown in table 6.
Table 6 significance of the test value formula
From this, after correcting the loci by the established bayesian correction model, the 396 locus recorded by PharmVar is taken as an independent sample, the F1 value is increased from uncorrected 0.9686 to corrected 0.9916, and the corrected F1 value is higher than 0.9900 (approaching 100%, which is a very significant increase), which indicates that the detection result of the single base level is already very close to the result of Illumina sequencing, and the result of the allele is not affected by the error of mutation detection. Wherein the total number of false negative variations (FN) is reduced by 82.2% (267/325), and the total number of false positive variations (FP) is reduced by 59.2% (129/218).
Meanwhile, another set of sample data analyzed with the use of minimap2+ nanopolish combination (specific procedure reference Liau,Yusmiati.et al.Nanopore sequencing of the pharmacogene CYP2D6 allows simultaneous haplotyping and detection of duplications.Pharmacogenomics J.14,1033-1047(2019).)) was used as a comparison without performing the bayesian correction model process, and the results are shown in table 7.
Table 7 minimap2+nanopolish combined test results
As a result, it was found that the effect of using minimap2+ nanopolish combination without correction was still far less than the detection effect of the method of the examples of the present invention.
In addition, in some sites (shown in table 8) with high frequency of CYP2D6 typing in China, for example, NG_008376.4:5119 sites (the alleles are identified as critical sites of 10 and 39), false negative variation exists in the results obtained by software before partial sample uncorrectation. This site pseudo-anion (FN) directly affects the detection accuracy of the allele (misjudging original 10 as 39), and after correction by bayesian model, the F1 value can reach 1.0000 at ng_008376.4:5119 site, which indicates that at the key site of CYP2D6 typing, the correction model of the above embodiment has very high accuracy, which is beneficial to improving the typing accuracy of the subsequent genotypes.
Table 8 NG_008376.4:5119 site Performance validation
| Type(s) | Ref | Alt | TP | FN | FP | TN | PPV | Sensitivity | F1 |
| After correction | C | T | 294 | 0 | 0 | 189 | 100% | 100% | 1.0000 |
| Before correction | C | T | 292 | 2 | 0 | 189 | 100% | 99.32% | 0.9966 |
The haplotype detection method comprises the following steps: phase commands of WhatsHap software (v 1.4) are utilized to split phases of the corrected VCF file and the ordered BAM file according to the detected point mutation (including SNP and small InDel), so as to obtain the split-phase VCF file; then, using haplotag command of WhatsHap software (v 1.4), haploid marks H1, H2 or None are carried out on reads according to the VCF files after phase separation and the BAM files after sequencing, and finally a phase separation list is obtained.
When the result obtained by the software is that phase separation is possible (namely, the phase separation list comprises H1 and H2), splitting the VCF file after phase separation into two VCF files of haploids (H1 and H2) according to the names of reads corresponding to the phase separation list by utilizing a Perl (v5.26.2) script; genotype testing was performed on the two haploid VCF files using Stargezar software (v2.0.0), and finally combining the haplotypes of the two haploids as the final genotype result. When the software obtained the result that phase separation was impossible (i.e., the phase separation list was None, "-"), stargezar software (v2.0.0) was used to genotype the corrected VCF file to obtain the final genotype result.
And respectively taking different independent samples for simulation verification. The results are shown in tables 9 and 10.
Table 9 haplotype split/split-not-split accuracy
| Method of | Accurate sample number | Accuracy rate of |
| Haplotype splitting | 473 | 97.93% |
| Haplotype is not split | 455 | 94.20% |
Table 10 haplotype split/not split final genotype inaccurate sample display
| Sample name | Haplotype is not split | Haplotype splitting |
| sample1 | *10/*39 | *1/*10 |
| sample2 | *10/*39 | *2/*10 |
| sample3 | *10/*106 | *1/*10 |
| sample4 | *10/*39 | *2/*10 |
| sample5 | -/- | *2/*36 |
| sample6 | -/- | *10/*36 |
It was found that after phase separation using WhatsHap software, the accuracy increased from 94.2% to 97.93%. Furthermore, the VCF file was split using Stargezar software (via Beagle software, which is self-contained in Stargezar software) after the comparison haplotype was not split, which indicated that the accuracy of WhatsHap split was higher than it was. Therefore, whatsHap software is more recommended for phase separation in Nanopore sequencing analysis.
Example 3 method verification
To illustrate the effectiveness of the above method, a new validation sample was additionally set to perform validation according to the above method, and the results are shown in tables 11 and 12.
TABLE 11 long read long sequencing genotyping accuracy
| Total number of samples | Genotype identity number of samples | Accuracy rate of |
| 31 | 29 | 93.55% |
Table 12 sample results display-qPCR and Long read Long sequencing genotype prediction results
The above results indicate that genotyping accuracy was as high as 93.55% by performing a belief analysis by the method of the above example using the Nanopore sequencing analysis of the above example, with qPCR results as a standard.
The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.
Claims (8)
1. A gene haplotype detection method comprises the following steps:
(1) Carrying out PCR amplification on a sample to be detected by using a specific primer to obtain a target fragment, and then carrying out PCR amplification on the target fragment again by using a label primer to obtain amplified products with different labels;
(2) Equivalent mixing is carried out on amplification products with different labels from different samples to be detected, a Nanopore sequencing library is constructed, long-reading long-sequencing is carried out on the Nanopore sequencing library, sequencing results are compared with human reference genome to obtain sequenced BAM files, mutation detection is carried out, VCF files are obtained, and a Bayesian correction model is used for correcting the VCF files to obtain corrected VCF files;
(3) Phase commands are used for carrying out phase separation on the corrected VCF file and the ordered BAM file, then haplotag commands are executed according to the phase separation results, the data are marked, and the gene haplotype is judged according to the marks;
Wherein, the specific primer is shown as SEQ ID NO: 1-2, wherein the label primer is shown as SEQ ID NO:3 to 206.
2. The method of claim 1, wherein the bayesian correction model is:
Wherein, gi represents the genotype of the target site, and G0, G1 and G2 represent the wild, heterozygous mutation and homozygous mutation, respectively;
a represents the frequency AF of the variant allele;
p (A|G0), P (A|G1) and P (A|G2) are obtained by calculating the sample mean value and the sample standard deviation of the site through the prior probabilities P (G0), P (G1) and P (G2) of the corresponding site genotypes respectively and then fitting by normal distribution.
3. The method according to claim 1, wherein the following formulas are used for checking the site result at the same time when correcting using a bayesian correction model:
wherein A represents the frequency AF of the variant allele;
μ represents the mean of the site in the wild-type sample;
Sigma represents the standard deviation of the site in the wild-type sample;
if the Z value is less than 1.96, the genotype of the corresponding site of the sample to be detected is identical to the wild type;
If the Z value is more than or equal to 1.96, the corresponding locus genotype of the sample to be detected is the locus genotype in the VCF file obtained in the step (2) of claim 1.
4. The method for detecting gene haplotype according to claim 1, wherein in the step (3), if phase separation is possible, the split VCF file is split into two haploid VCF files by using Perl script, then genotype detection is performed on the two haploid VCF files by using Stargezar software, and finally the haplotype of the two haploids is combined as a final genotype result;
if the phase separation can not be carried out, carrying out genotype detection on the corrected VCF file by Stargezar software to obtain a final genotype result.
5. The method of claim 1, further comprising performing data processing after long-read long-sequencing, comprising:
Extracting DNA sequence information by Guppy software, filtering out q <8 parts, then using a specific label index as a mark, using a Python script to set the index fault tolerance number as 1 pair of filtered data splitting, removing joints, filtering out q <9 parts, using Minimap2 comparison software to perform sequence comparison to obtain SAM comparison files, then using Samtools software to process to obtain ordered BAM files, using mplieup and call commands of Bcftools software, and using multiallelic-caller algorithm to perform mutation detection on the ordered BAM files to obtain VCF files.
6. The method of claim 5, wherein the Minimap alignment software uses map-ont mode.
7. The method according to claim 5, wherein when processing using Samtools software, processing is performed using view, sort, and index commands in sequence.
8. Use of the gene haplotype detection method according to any one of claims 1-7 in CYP2D6 enzyme activity typing.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311620961.8A CN117711488B (en) | 2023-11-29 | 2023-11-29 | Gene haplotype detection method based on long-reading long-sequencing and application thereof |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311620961.8A CN117711488B (en) | 2023-11-29 | 2023-11-29 | Gene haplotype detection method based on long-reading long-sequencing and application thereof |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN117711488A CN117711488A (en) | 2024-03-15 |
| CN117711488B true CN117711488B (en) | 2024-07-02 |
Family
ID=90157981
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202311620961.8A Active CN117711488B (en) | 2023-11-29 | 2023-11-29 | Gene haplotype detection method based on long-reading long-sequencing and application thereof |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN117711488B (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103459614A (en) * | 2011-01-05 | 2013-12-18 | 香港中文大学 | Non-invasive prenatal genotyping of fetal sex chromosomes |
| CN103745136A (en) * | 2013-12-26 | 2014-04-23 | 中国农业大学 | Efficient haplotype inference and deleted genotype fill method |
Family Cites Families (17)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9977861B2 (en) * | 2012-07-18 | 2018-05-22 | Illumina Cambridge Limited | Methods and systems for determining haplotypes and phasing of haplotypes |
| CN108885648A (en) * | 2016-02-09 | 2018-11-23 | 托马生物科学公司 | Systems and methods for analyzing nucleic acids |
| TW201816645A (en) * | 2016-09-23 | 2018-05-01 | 美商德萊福公司 | Integrated systems and methods for automated processing and analysis of biological samples, clinical information processing and clinical trial matching |
| US11725232B2 (en) * | 2016-10-31 | 2023-08-15 | The Hong Kong University Of Science And Technology | Compositions, methods and kits for detection of genetic variants for alzheimer's disease |
| CN107766785B (en) * | 2017-01-25 | 2022-04-29 | 丁贤根 | Face recognition method |
| MX2020012717A (en) * | 2018-05-25 | 2021-07-15 | Arca Biopharma Inc | Methods and compositions involving bucindolol for the treatment of atrial fibrillation. |
| CN109063417B (en) * | 2018-07-09 | 2022-03-15 | 福建国脉生物科技有限公司 | Genotype filling method for constructing hidden Markov chain |
| US10468141B1 (en) * | 2018-11-28 | 2019-11-05 | Asia Genomics Pte. Ltd. | Ancestry-specific genetic risk scores |
| GB202004528D0 (en) * | 2020-03-27 | 2020-05-13 | Univ Birmingham | Methods, compositions and kits for hla typing |
| CN111518917B (en) * | 2020-04-02 | 2022-06-07 | 中山大学 | Micro haplotype genetic marker combination and method for noninvasive prenatal paternity relationship determination |
| CN114250279B (en) * | 2020-09-22 | 2024-04-30 | 上海韦翰斯生物医药科技有限公司 | Construction method of haplotype |
| US20240287048A1 (en) * | 2020-10-16 | 2024-08-29 | The Broad Institute, Inc. | Substituted acyl sulfonamides for treating cancer |
| CN113555062B (en) * | 2021-07-23 | 2022-07-12 | 哈尔滨因极科技有限公司 | Data analysis system and analysis method for genome base variation detection |
| CN113564247B (en) * | 2021-09-24 | 2022-01-28 | 北京贝瑞和康生物技术有限公司 | Primer group and kit for simultaneously detecting multiple mutations of 9 genes related to congenital adrenal cortical hyperplasia |
| CN114496077B (en) * | 2022-04-15 | 2022-06-21 | 北京贝瑞和康生物技术有限公司 | Methods, devices, and media for detecting single nucleotide variations and indels |
| CN114649055B (en) * | 2022-04-15 | 2022-10-21 | 北京贝瑞和康生物技术有限公司 | Methods, devices and media for detecting single nucleotide variations and indels |
| CN117133355A (en) * | 2023-08-25 | 2023-11-28 | 山东省农业科学院畜牧兽医研究所 | An error correction and missing filling method and application of GBTS detection genotype |
-
2023
- 2023-11-29 CN CN202311620961.8A patent/CN117711488B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103459614A (en) * | 2011-01-05 | 2013-12-18 | 香港中文大学 | Non-invasive prenatal genotyping of fetal sex chromosomes |
| CN103745136A (en) * | 2013-12-26 | 2014-04-23 | 中国农业大学 | Efficient haplotype inference and deleted genotype fill method |
Also Published As
| Publication number | Publication date |
|---|---|
| CN117711488A (en) | 2024-03-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Kumar et al. | Next-generation sequencing and emerging technologies | |
| Deschamps et al. | Genotyping-by-sequencing in plants | |
| Pereira et al. | Straightforward inference of ancestry and admixture proportions through ancestry-informative insertion deletion multiplexing | |
| US20160125128A1 (en) | Accurate typing of hla through exome sequencing | |
| Lu et al. | The motif composition of variable number tandem repeats impacts gene expression | |
| CN109182538B (en) | Method for genotyping and analyzing key SNPs sites rs88640083 and 2b-RAD of dairy cow mastitis | |
| US20230120825A1 (en) | Compositions, Methods, and Systems for Paternity Determination | |
| CN112513292A (en) | Method and device for detecting homologous sequence based on high-throughput sequencing | |
| Silva et al. | A 3K Axiom SNP array from a transcriptome-wide SNP resource sheds new light on the genetic diversity and structure of the iconic subtropical conifer tree Araucaria angustifolia (Bert.) Kuntze | |
| CN116312776A (en) | Method for detecting differentiated RNA editing sites | |
| CN112086131A (en) | A screening method for false positive variant sites in high-throughput sequencing | |
| CN115851964A (en) | SNP molecular marker related to milk production traits and lamb production traits of milk goats, liquid chip detection kit and application | |
| Valle-Silva et al. | Analysis and comparison of the STR genotypes called with HipSTR, STRait Razor and toaSTR by using next generation sequencing data in a Brazilian population sample | |
| Kim et al. | Validation and application of new NGS‐based HLA genotyping to clinical diagnostic practice | |
| US20200265920A1 (en) | A system for determining diplotypes | |
| CN117711488B (en) | Gene haplotype detection method based on long-reading long-sequencing and application thereof | |
| Pouseele et al. | Accurate whole-genome sequencing-based epidemiological surveillance of Mycobacterium tuberculosis | |
| JP2025013900A (en) | Methods and systems for detecting allelic imbalance in cell-free nucleic acid samples - Patents.com | |
| Xu et al. | The research of a large-scale analysis platform for MNS blood group identification based on long-read sequencing | |
| CN116083562B (en) | SNP marker combination and primer set related to aspirin resistance auxiliary diagnosis and application thereof | |
| CN109182505B (en) | Method for genotyping and analyzing key SNPs sites rs75762330 and 2b-RAD of dairy cow mastitis | |
| CN105154543A (en) | Quality control method for biological sample nucleic acid detection | |
| Ruiz-Ramírez et al. | Inter-platform evaluation of the MPSplex large-scale tri-allelic SNP panel for forensic identification | |
| Benaglio et al. | Ultra high throughput sequencing in human DNA variation detection: a comparative study on the NDUFA3-PRPF31 region | |
| CN109182504B (en) | Method for genotyping and analyzing key SNPs sites rs20438858 and 2b-RAD of dairy cow mastitis |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant |