WO2014029093A1 - Procédé et système permettant de déterminer si une personne est dans un état anormal - Google Patents
Procédé et système permettant de déterminer si une personne est dans un état anormal Download PDFInfo
- Publication number
- WO2014029093A1 WO2014029093A1 PCT/CN2012/080500 CN2012080500W WO2014029093A1 WO 2014029093 A1 WO2014029093 A1 WO 2014029093A1 CN 2012080500 W CN2012080500 W CN 2012080500W WO 2014029093 A1 WO2014029093 A1 WO 2014029093A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequencing
- snp
- nucleic acid
- individual
- acid sample
- Prior art date
Links
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 124
- 238000000034 method Methods 0.000 title claims abstract description 91
- 238000012163 sequencing technique Methods 0.000 claims abstract description 225
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 94
- 150000007523 nucleic acids Chemical class 0.000 claims abstract description 94
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 94
- 239000000523 sample Substances 0.000 claims description 118
- 210000004027 cell Anatomy 0.000 claims description 61
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 55
- 201000010099 disease Diseases 0.000 claims description 50
- 230000003321 amplification Effects 0.000 claims description 49
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 49
- 108090000623 proteins and genes Proteins 0.000 claims description 49
- 238000001914 filtration Methods 0.000 claims description 24
- 210000001109 blastomere Anatomy 0.000 claims description 22
- 102220005257 rs33930702 Human genes 0.000 claims description 22
- 102220005232 rs33941849 Human genes 0.000 claims description 22
- 108020004711 Nucleic Acid Probes Proteins 0.000 claims description 20
- 238000010276 construction Methods 0.000 claims description 20
- 239000002853 nucleic acid probe Substances 0.000 claims description 20
- 102220008912 rs33956879 Human genes 0.000 claims description 19
- 102220005239 rs33915217 Human genes 0.000 claims description 18
- 102220005234 rs33945777 Human genes 0.000 claims description 18
- 210000004369 blood Anatomy 0.000 claims description 17
- 239000008280 blood Substances 0.000 claims description 17
- 208000005980 beta thalassemia Diseases 0.000 claims description 16
- 238000000926 separation method Methods 0.000 claims description 16
- 102220005158 rs33960103 Human genes 0.000 claims description 14
- 102200082936 rs33950507 Human genes 0.000 claims description 13
- 238000012216 screening Methods 0.000 claims description 13
- 210000003917 human chromosome Anatomy 0.000 claims description 12
- 102220243471 rs63750283 Human genes 0.000 claims description 12
- 239000012472 biological sample Substances 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 11
- 102220005206 rs33974936 Human genes 0.000 claims description 11
- 101150013707 HBB gene Proteins 0.000 claims description 10
- 208000002903 Thalassemia Diseases 0.000 claims description 10
- 102220005210 rs33910569 Human genes 0.000 claims description 10
- 102220005233 rs33971440 Human genes 0.000 claims description 10
- 102220005204 rs63750783 Human genes 0.000 claims description 9
- 239000000344 soap Substances 0.000 claims description 9
- 238000005516 engineering process Methods 0.000 claims description 8
- 235000013601 eggs Nutrition 0.000 claims description 7
- 102220005241 rs33951465 Human genes 0.000 claims description 7
- 102200082883 rs33959855 Human genes 0.000 claims description 7
- 102200082890 rs33972047 Human genes 0.000 claims description 7
- 102220005202 rs33986703 Human genes 0.000 claims description 7
- 102220243486 rs34563000 Human genes 0.000 claims description 7
- 102200082943 rs35424040 Human genes 0.000 claims description 7
- 102220010319 rs35578002 Human genes 0.000 claims description 7
- 102220243482 rs35799536 Human genes 0.000 claims description 7
- 238000003113 dilution method Methods 0.000 claims description 6
- 210000002257 embryonic structure Anatomy 0.000 claims description 6
- 238000000684 flow cytometry Methods 0.000 claims description 6
- 102200118188 rs33913712 Human genes 0.000 claims description 6
- 102200082814 rs33922842 Human genes 0.000 claims description 6
- 102200118255 rs33931779 Human genes 0.000 claims description 6
- 102220005161 rs33933298 Human genes 0.000 claims description 6
- 102200117970 rs33946267 Human genes 0.000 claims description 6
- 102200082946 rs33948578 Human genes 0.000 claims description 6
- 102200117947 rs33953406 Human genes 0.000 claims description 6
- 102200118256 rs33969400 Human genes 0.000 claims description 6
- 102200117938 rs33971634 Human genes 0.000 claims description 6
- 102220008911 rs33978907 Human genes 0.000 claims description 6
- 102220005209 rs33982568 Human genes 0.000 claims description 6
- 102220005242 rs33985472 Human genes 0.000 claims description 6
- 102200118254 rs33995148 Human genes 0.000 claims description 6
- 102220008910 rs34029390 Human genes 0.000 claims description 6
- 102220211197 rs34809925 Human genes 0.000 claims description 6
- 102200118223 rs35256489 Human genes 0.000 claims description 6
- 102200117964 rs35485099 Human genes 0.000 claims description 6
- 102220005240 rs35724775 Human genes 0.000 claims description 6
- 102200117961 rs36015961 Human genes 0.000 claims description 6
- 102220005243 rs63750954 Human genes 0.000 claims description 6
- 102220005250 rs63751128 Human genes 0.000 claims description 6
- 239000007787 solid Substances 0.000 claims description 6
- 210000001519 tissue Anatomy 0.000 claims description 6
- 210000002700 urine Anatomy 0.000 claims description 6
- 238000007672 fourth generation sequencing Methods 0.000 claims description 5
- 102220243799 rs7480526 Human genes 0.000 claims description 5
- 208000025499 G6PD deficiency Diseases 0.000 claims description 3
- 206010018444 Glucose-6-phosphate dehydrogenase deficiency Diseases 0.000 claims description 3
- 210000003743 erythrocyte Anatomy 0.000 claims description 3
- 208000008605 glucosephosphate dehydrogenase deficiency Diseases 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims 2
- 238000012800 visualization Methods 0.000 claims 2
- 108020004414 DNA Proteins 0.000 description 24
- 238000004458 analytical method Methods 0.000 description 10
- 210000000349 chromosome Anatomy 0.000 description 9
- 238000012165 high-throughput sequencing Methods 0.000 description 9
- 230000035772 mutation Effects 0.000 description 8
- 102100021519 Hemoglobin subunit beta Human genes 0.000 description 7
- 238000013412 genome amplification Methods 0.000 description 7
- 208000026350 Inborn Genetic disease Diseases 0.000 description 6
- 208000016361 genetic disease Diseases 0.000 description 6
- 230000002068 genetic effect Effects 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 5
- 238000002360 preparation method Methods 0.000 description 5
- 102200082948 rs33916412 Human genes 0.000 description 5
- 238000012070 whole genome sequencing analysis Methods 0.000 description 5
- 208000024556 Mendelian disease Diseases 0.000 description 4
- 238000012350 deep sequencing Methods 0.000 description 4
- 239000006166 lysate Substances 0.000 description 4
- 230000002934 lysing effect Effects 0.000 description 4
- 210000001161 mammalian embryo Anatomy 0.000 description 4
- 238000001531 micro-dissection Methods 0.000 description 4
- 238000000338 in vitro Methods 0.000 description 3
- 210000000265 leukocyte Anatomy 0.000 description 3
- 238000003908 quality control method Methods 0.000 description 3
- 239000000243 solution Substances 0.000 description 3
- 238000001712 DNA sequencing Methods 0.000 description 2
- 239000013614 RNA sample Substances 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 2
- 238000001574 biopsy Methods 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 230000009089 cytolysis Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000003205 genotyping method Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 108700028369 Alleles Proteins 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 101710179516 Run domain Beclin-1-interacting and cysteine-rich domain-containing protein Proteins 0.000 description 1
- 102100030852 Run domain Beclin-1-interacting and cysteine-rich domain-containing protein Human genes 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 208000036878 aneuploidy Diseases 0.000 description 1
- 231100001075 aneuploidy Toxicity 0.000 description 1
- 210000000601 blood cell Anatomy 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 239000013592 cell lysate Substances 0.000 description 1
- 230000006037 cell lysis Effects 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000003776 cleavage reaction Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 230000008303 genetic mechanism Effects 0.000 description 1
- 238000012268 genome sequencing Methods 0.000 description 1
- 238000009396 hybridization Methods 0.000 description 1
- 230000009390 immune abnormality Effects 0.000 description 1
- 238000002513 implantation Methods 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 239000007791 liquid phase Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 230000001717 pathogenic effect Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000007017 scission Effects 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- 238000009097 single-agent therapy Methods 0.000 description 1
- 239000007790 solid phase Substances 0.000 description 1
- 238000007671 third-generation sequencing Methods 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Definitions
- the present invention relates to the field of biomedicine and, in particular, to methods and systems for determining whether an individual has an abnormal state.
- Mendelian genetic disease also known as single-gene disease (in this paper, Mendelian genetic disease and single-gene disease are used interchangeably), according to genetic methods can be divided into autosomal dominant, autosomal recessive, with sexual dominant , with concealed recessive genetic diseases.
- OMIM online human Mendelian Genetic Database
- the present invention aims to solve at least one of the technical problems existing in the prior art. To this end, the present invention proposes a method and system for determining whether an individual has an abnormal state.
- the invention proposes a method of determining whether an individual has an abnormal state.
- the method comprises: constructing a sequencing library for the nucleic acid sample of the individual; sequencing the sequencing library to obtain a sequencing result, the sequencing result consisting of a plurality of sequencing data; Sequencing the data, determining a known SNP contained in the sequencing result; and determining whether the individual has an abnormal state associated with the known SNP based on the known SNP.
- the SNPs contained in the nucleic acid samples can be efficiently determined by sequencing, and since these SNPs are related to the abnormal state, it is possible to effectively determine whether the source individuals of the nucleic acid samples have abnormalities associated with these SNPs. status.
- the method of determining whether an individual has an abnormal state may also have the following additional technical features:
- the individual is a human.
- the human body sample can be detected by the method of determining whether the individual has an abnormal state according to an embodiment of the present invention, so that it is possible to effectively predict whether the person has some abnormal state.
- the abnormal state is a disease.
- the disease is a monogenic disease.
- the monogenic disease is at least one selected from the group consisting of thalassemia and erythrocyte glucose-6-phosphate dehydrogenase deficiency.
- the thalassemia is beta-thalassemia.
- the nucleic acid sample is at least a portion of an individual's whole genome DNA.
- the nucleic acid sample is extracted from a single cell or a microsample of the individual.
- the single or minimal sample is isolated from at least one member selected from the group consisting of blood, tissue, urine, gametes, fertilized eggs, blastomeres, and embryos.
- the single cells are isolated by at least one selected from the group consisting of a dilution method, a mouth pipette separation method, micromanipulation, microdissection, flow cytometry, and flow control.
- constructing the sequencing library for the nucleic acid sample of the individual further comprises: amplifying the nucleic acid sample to obtain a nucleic acid sample amplification product; and amplifying the product for the nucleic acid sample, The sequencing library was constructed.
- the efficiency of constructing the sequencing library can be improved, thereby further improving the efficiency of subsequent determination of whether or not the individual has an abnormal state.
- the nucleic acid sample is whole genome DNA extracted from a single cell of an individual, for example, may be a whole-base genomic DNA released by lysing a single cell of an individual, wherein the whole genome DNA is Amplification is carried out by at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA and MDA.
- the efficiency of amplifying whole genome DNA can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state.
- the method before constructing the sequencing library for the nucleic acid sample amplification product, the method further comprises: screening the nucleic acid sample amplification product by using a nucleic acid probe to obtain a predetermined region. Amplification product of the nucleic acid sample; and constructing the sequencing library for the nucleic acid sample amplification product from the predetermined region.
- the predetermined area is at least one exon area. Thereby, the known SNPs included in the predetermined area can be effectively determined, whereby the efficiency of determining the abnormal state related to the SNPs according to the predetermined SNPs included in the predetermined area such as the exon area can be effectively improved. And accuracy.
- the nucleic acid probe is provided in the form of a chip. Thereby, the efficiency of screening using the nucleic acid probe can be further improved, thereby further improving the efficiency of subsequently determining whether the individual has an abnormal state.
- the sequencing is performed using at least one selected from the group consisting of Illumina Hiseq 2000, Genome Analyzer, SOLiD sequencing system, Ion Torrents Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS technology, and nanopore sequencing technology. ongoing. Thereby, the high-throughput, deep-sequencing characteristics of these sequencing devices can be utilized to further improve the efficiency of determining whether an individual has an abnormal state.
- the sequencing is performed using Illumina Hiseq 2000, which is 90 bp in length. Thereby, the efficiency of subsequent SNP analysis can be further improved, thereby improving the efficiency of determining whether an individual has an abnormal state.
- the determining the known SNPs included in the sequencing result based on the sequencing data is performed by comparing the sequencing data with a reference gene.
- the reference gene is a known human genomic sequence.
- the alignment is performed by SOAP/SOAP2 software.
- the method further comprises filtering the known SNPs included in the sequencing result, the filtering is based on the following filtering conditions: SNP calling quality value is greater than 20; SNP site sequencing depth is greater than 8; SNP The depth of the locus is less than 5 times the average depth of the genome; the copy number of the SNP locus is no more than 2; and the distance between the SNP locus and the nearest other SNP loci is greater than 5.
- SNP calling quality value is greater than 20
- SNP site sequencing depth is greater than 8
- SNP The depth of the locus is less than 5 times the average depth of the genome; the copy number of the SNP locus is no more than 2; and the distance between the SNP locus and the nearest other SNP loci is greater than 5.
- the known SNP is located in the human chromosome HBB gene region. In one embodiment of the invention, the known SNP is said to be at least one selected from the group consisting of: rs33985472, rs63750954, rs63751128, rs33978907, rs34029390, rs34809925, rs33953406, rs33910569, rs33910569, rs33971634, rs3392539 rs33946267, rs35485099, rs36015961, rs33930977, rs35256489, rs33952266, rs33952266, rs33952266, rs33914668, rs33913413, rs33913413, rs34483965, rs34793594, rs35703285,
- the invention proposes a system for determining whether an individual has an abnormal state.
- the system comprises: a sequencing library construction device for constructing a sequencing library for a nucleic acid sample of the individual; a sequencing device, the sequencing device and the sequencing library construction device Connected for sequencing the sequencing library to obtain a sequencing result, the sequencing result consisting of a plurality of sequencing data; a SNP determining device, the SNP determining device being coupled to the sequencing device for performing the sequencing based Data, determining a known SNP included in the sequencing result; and abnormal state determining means, the abnormal device determining means being connected to the SNP determining means for determining whether the individual is suffering based on the known SNP There are abnormal states associated with the known SNPs.
- a method of determining whether an individual has an abnormal state as described above, using a system according to an embodiment of the present invention for determining whether an individual has an abnormal state, thereby efficiently determining a SNP contained in a nucleic acid sample by sequencing, and Since these SNPs are related to abnormal states, it is thereby possible to effectively determine whether the source individuals of the nucleic acid samples have abnormal states associated with these SNPs.
- the system for determining whether an individual has an abnormal state may also have the following additional technical features:
- the system further comprises: a nucleic acid sample extraction device adapted to extract at least a portion of the individual's whole genomic DNA from a single cell or a micro sample of the individual.
- the system further comprises: a biological sample separation device adapted to be selected from the group consisting of blood, tissue, urine, gametes, fertilized eggs, blastomeres, and embryos At least one separates a single cell or a microsample.
- a biological sample separation device adapted to be selected from the group consisting of blood, tissue, urine, gametes, fertilized eggs, blastomeres, and embryos At least one separates a single cell or a microsample.
- the biological sample separation device is adapted to be separated by a method selected from a dilution method and a mouth pipette Method, micromanipulation, microdissection, flow cytometry, microfluidic separation of at least one isolated single cell or microsample.
- the above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention.
- the single cells of the biological sample can be obtained efficiently and conveniently, so that the subsequent operations can be performed, whereby the single cells can be effectively separated from the individual, thereby further improving the efficiency of subsequently determining whether the individual has an abnormal state.
- the sequencing library construction device further comprises: a nucleic acid sample amplification unit, wherein the nucleic acid sample amplification unit is adapted to amplify the nucleic acid sample to obtain a nucleic acid sample amplification product.
- the amplification unit is adapted to perform at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA, and MDA.
- the above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention. Thereby, the efficiency of amplifying whole genome DNA can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state.
- the sequencing library construction device further includes: a sorting unit, wherein the screening unit is provided with a nucleic acid probe to screen the nucleic acid sample amplification product by using the nucleic acid probe And obtaining a nucleic acid sample amplification product from a predetermined region; and constructing the sequencing library for the nucleic acid sample amplification product from the predetermined region.
- the nucleic acid probe is provided in the form of a chip.
- the efficiency of screening using the nucleic acid probe can be further improved, thereby further improving the efficiency of subsequently determining whether the individual has an abnormal state.
- the sequencing device is at least selected from the group consisting of Illumina Hiseq2000, Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS sequencing device, and nanopore sequencing device.
- Illumina Hiseq2000 Genome Analyzer
- SOLiD sequencing system Ion Torrent
- Ion Proton 454, PacBio RS sequencing system
- Helicos tSMS sequencing device Helicos tSMS sequencing device
- nanopore sequencing device is at least selected from the group consisting of Illumina Hiseq2000, Genome Analyzer, SOLiD sequencing system, Ion Torrent, Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS sequencing device, and nanopore sequencing device.
- the sequencing is performed using Illumina Hiseq 2000, which is 90 bp in length. Thereby, the efficiency of subsequent SNP analysis can be further improved, thereby improving the efficiency of determining whether an individual has an abnormal state
- the SNP determining apparatus further includes: a comparing unit, configured to determine, by comparing the sequencing data with a reference gene, the included in the sequencing result Know the SNP.
- the comparison unit is adapted to perform alignment using SOAP/SOAP2 software.
- the SNP determining apparatus further includes: a SNP filtering unit, wherein the SNP filtering unit is adapted to filter the known SNPs included in the sequencing result based on the following filtering conditions: SNP calling The mass value is greater than 20; the SNP site sequencing depth is greater than 8; the SNP site depth is less than 5 times the genome average depth; the SNP site copy number is not greater than 2; and the distance between the SNP site and the nearest other SNP site is greater than 5.
- SNP calling The mass value is greater than 20; the SNP site sequencing depth is greater than 8; the SNP site depth is less than 5 times the genome average depth; the SNP site copy number is not greater than 2; and the distance between the SNP site and the nearest other SNP site is greater than 5.
- the known SNP is located in the human chromosome HBB gene region.
- the known SNP is at least one selected from the group consisting of: rs33985472, rs63750954, rs63751128, rs33978907, rs34029390, rs34809925, rs33953406, rs33910569, rs33910569, rs33971634, rs3392539 rs33946267, rs35485099, rs36015961, rs33930977, rs35256489, rs33952266, rs33952266, rs33952266, rs33914668, rs33913413, rs33913413, rs34483965, rs34793594, rs35703285, rs
- the above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention. It is thus possible to effectively determine whether the subject has a risk of thalassemia, especially beta-thalassemia.
- FIG. 1 is a flow chart showing a method of determining whether an individual has an abnormal state according to an embodiment of the present invention
- FIG. 2 is a flow chart showing a method of determining whether an individual has an abnormal state according to another embodiment of the present invention
- FIG. 4 is a schematic structural view of a system for determining whether an individual has an abnormal state according to an embodiment of the present invention
- FIG. 6 is a flow chart showing a method of determining whether an individual has an abnormal state, according to another embodiment of the present invention. Detailed description of the invention
- abnormal state as used herein shall be understood broadly, and may be any state different from the normal state of an individual such as a person, and may include, for example, a disease, an immune abnormality, or the like.
- the abnormal state is a disease.
- the type of the disease is not particularly limited, and according to a preferred embodiment of the present invention, the disease is a single gene disease.
- a single gene disease monotherapy is usually a disease or pathological trait controlled by a pair of alleles. Therefore, by detecting SNPs associated with a single gene disease, it is possible to effectively determine whether the subject under study has the disease.
- the monogenic disease is at least one selected from the group consisting of thalassemia and erythrocyte glucose-6-phosphate dehydrogenase deficiency.
- the thalassemia is beta-thalassemia.
- the invention proposes a method of determining whether an individual has an abnormal state.
- the method includes:
- a sequencing library is first constructed for individual nucleic acid samples, which can be used for subsequent sequencing and results analysis.
- the term "individual” is used without limitation, and may be any organism containing genetic information, for example, may be human. Thus, it is possible to determine whether an individual is suffering from an embodiment according to an embodiment of the present invention.
- the method of abnormal state detects the human body sample, so that it can effectively predict whether a person has some abnormal state.
- the method further includes the step of extracting a nucleic acid sample from an individual.
- nucleic acid sample as used herein shall be understood broadly and may be a DNA sample or It can be an RNA sample, or it can be a modified or processed DNA sample or RNA sample, as long as the genetic sequence can be determined by sequencing.
- the nucleic acid sample can be at least a portion of an individual's whole genome DNA.
- the whole genome DNA contains all the genetic information of the individual, and thus, by sequencing and SNP analysis of the whole genome DNA, the SNP information of the individual can be obtained more effectively and completely, thereby further improving whether the individual has an abnormality. The efficiency and accuracy of the state method.
- the nucleic acid sample is extracted from a single cell or a microsample of the individual.
- the method and means for obtaining a nucleic acid sample from a single cell or a micro sample are not particularly limited, and for example, single cell cleavage may be carried out by using a lysate to effect release and collect single cell whole genome DNA.
- the single or minimal sample is isolated from at least one member selected from the group consisting of blood, tissue, urine, gametes, fertilized eggs, blastomeres, and embryos.
- the efficiency of determining whether an individual has an abnormal state can be improved by effectively predicting and evaluating whether an individual has an abnormal state by a small sample from an individual, and the cost of determining whether the individual has an abnormal state is reduced.
- these samples can be easily obtained from organisms, and can be specifically sampled for certain diseases to take specific analytical measures for certain specific diseases.
- the single cells are isolated by at least one selected from the group consisting of a dilution method, a mouth pipette separation method, micromanipulation, microdissection, flow cytometry, and microfluidics.
- the obtained nucleic acid sample can be expanded, especially for a nucleic acid sample extracted from a single cell or a micro sample.
- constructing a sequencing library for the nucleic acid sample of the individual further comprises: amplifying the obtained nucleic acid sample to obtain a nucleic acid sample amplification product (S102).
- a product can be amplified from the obtained nucleic acid sample to construct a sequencing library.
- the method of amplifying a nucleic acid sample according to an embodiment of the present invention is not particularly limited.
- the nucleic acid sample employed is whole genome DNA extracted from a single cell of an individual, and amplification of the coupon genomic DNA can be performed by selecting from PEP-PCR, DOP-PCR, OmniPlex Conducted by at least one of WGA and MDA.
- the efficiency of amplifying whole genome DNA can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state.
- the step of lysing the single cells to release the whole genome of the single cells may be further included.
- a method which can be used for lysing a single cell and releasing a whole genome is not particularly limited as long as single cell lysis can be preferably sufficiently lysed.
- the single cell can be cleaved and released using an alkaline lysate Single-genome whole genome. The inventors have found that this can effectively lyse single cells and release the whole genome, and the released whole genome can improve the accuracy when sequencing, thereby further improving the efficiency of determining single cell chromosome aneuploidy.
- the method of single-cell whole genome amplification is not particularly limited, and PCR-based methods such as PEP-PCR, DOP-PCR, and OmniPlex WGA may be employed, and non-PCR-based methods may be employed, for example. MDA (multiple strand displacement amplification).
- a PCR based method such as the OmniPlex WGA method, is preferably employed.
- Commercial kits of choice include, but are not limited to, GenomePlex from Sigma Aldrich, PicoPlex from Rubicon Genomics, REPLI-g from Qiagen, illustra GenomiPhi from GE Healthcare, and the like.
- the single cell whole genome can be amplified using OmniPlex WGA prior to construction of the sequencing library.
- the whole genome can be efficiently amplified, thereby further improving the efficiency of determining whether the individual has an abnormal state.
- the method before constructing the sequencing library for the nucleic acid sample amplification product, further comprises: screening the nucleic acid sample amplification product by using a nucleic acid probe to obtain a predetermined region. Amplification product of the nucleic acid sample; and constructing the sequencing library for the nucleic acid sample amplification product from the predetermined region.
- the predetermined area is at least one exon area.
- the method of screening the amplification product of the nucleic acid sample by means of the nucleic acid probe is not particularly limited, and may be a solid phase screening or a liquid phase hybridization.
- the nucleic acid probe can be provided in the form of a chip. Thereby, the efficiency of screening using the nucleic acid probe can be further improved, thereby further improving the efficiency of subsequently determining whether the individual has an abnormal state.
- the nucleic acid sample of the predetermined region can also be analyzed by other known methods, for example, the nucleic acid sample is subjected to PC using a specific primer, thereby obtaining a related amplification product of a predetermined region. Thereby, a sequencing library of the predetermined region is constructed, and information about the predetermined region is obtained.
- a method of constructing a sequencing library for a nucleic acid sample of an individual is not particularly limited.
- Those skilled in the art can select different methods for constructing a whole genome sequencing library according to the specific scheme of the genome sequencing technology adopted.
- For details on constructing the whole genome sequencing library refer to the protocol provided by the manufacturer of the sequencing instrument, such as Illumina, for example, see Illumina Corporation Multiplexing Sample Preparation Guide (Part #1005361; Feb 2010) or Paired-End SamplePrep Guide (Part #1005063; Feb 2010), which is incorporated herein by reference.
- sequencing can be performed using at least one selected from the group consisting of Illumina Hiseq 2000, Genome Analyzer, SOLiD sequencing system, Ion Torrents Ion Proton, 454, PacBio RS sequencing system, Helicos tSMS technology, and nanopore sequencing technology.
- Illumina Hiseq 2000 Genome Analyzer
- SOLiD sequencing system Ion Torrents Ion Proton
- 454 PacBio RS sequencing system
- Helicos tSMS technology Helicos tSMS technology
- nanopore sequencing technology nanopore sequencing technology
- the length of the sequencing data obtained by whole genome sequencing is not particularly limited.
- the sequencing data is 90 bp in length using Illumina Hiseq2000. Applicants have surprisingly found that when the length of the sequencing data is about 90 bp, the sequencing data can be greatly facilitated, the analysis efficiency is improved, and the cost of the analysis can be significantly reduced. The efficiency of determining whether an individual has an abnormal state is further improved, and the cost of determining whether the individual has an abnormal state is reduced.
- sequence data refers to the average of the length values of individual sequencing data.
- the genetic information contained in the sequencing result can be obtained by analyzing the sequencing data included in the sequencing result, for example, SNP information can be obtained.
- the method of analyzing the sequencing data contained in the sequencing result to obtain the SNP information is not particularly limited.
- SNP information in the obtained sequencing result can be determined by comparing the obtained sequencing data with a reference gene.
- the reference gene used is a known human genome sequence, which may be, for example, Hgl9, NCBI Build 37.
- the reference gene used is a known human genome sequence, which may be, for example, Hgl9, NCBI Build 37.
- the reference gene used is a known human genome sequence, which may be, for example, Hgl9, NCBI Build 37.
- those skilled in the art may also use other known sequences as reference sequences, for example, The known SNPs were used as a reference sequence for alignment.
- the method and software employed for the comparison are not particularly limited.
- the alignment between the sequencing data and the reference sequence can be performed using SOAP/SOAP2 software.
- sequencing data may also be assembled first, and the assembly result is compared with a reference sequence. Thereby, the sequencing data contained in the sequencing result can be effectively analyzed, so that the SNP contained in the sequencing result can be effectively determined, and the efficiency of determining whether the individual has an abnormal state can be improved.
- SOAP/SOAP2 software provides accurate alignment of short-sequence data from Illumina sequencing systems.
- the statistical data quality value, the comparison rate, the GC content, the repetition rate, the genome coverage, the sequencing depth, and the like may be used according to the comparison result, and the sequencing data is quality-controlled according to the above information.
- the useless data that cannot be compared or repeated is removed. Get a valid data set for subsequent analysis.
- the obtained SNP information may also be filtered, for example, by pairing the sequencing results based on the preset filtering conditions.
- SNPs are known for filtration.
- the filtering conditions that can be employed are at least one of the following:
- the SNP calling quality value is greater than 20.
- SNP calling quality value refers to the scoring result given to the confidence of each SNP during the operation of the SOAP analysis software.
- SNP site sequencing depth is greater than 8;
- the depth of the SNP locus is less than 5 times the average depth of the genome
- SNP site copy number is not greater than 2;
- the distance between the SNP site and the nearest other SNP site is greater than 5, and the expression "distance between two SNP sites” as used herein refers to the bases between the two SNP sites.
- the number for example, “the distance is greater than 5", that is, the number of bases separated by two sites is greater than five.
- the obtained SNP results can be effectively filtered, so that the accuracy and efficiency of determining whether an individual has an abnormal state can be effectively improved.
- sequencing data is obtained by sequencing, and the sequencing data is analyzed, SNP information contained in the sequencing result can be obtained. Further, the SNP can be analyzed to determine whether the individual has an abnormal state associated with the SNP, such as whether or not there is a single-gene disease associated with the SNP.
- the type of SNP that can be employed is not particularly limited.
- the known SNP is located in the human chromosome HBB gene region.
- the known SNP is at least one selected from the group consisting of: rs33985472, rs63750954, rs63751128, rs33978907, rs34029390, rs34809925, rs33953406, rs33910569, rs33910569, rs33971634, rs3392539 rs33946267, rs35485099, rs36015961, rs33930977, rs35256489, rs33952266, rs33952266, rs33952266, rs33914668, rs33913413, rs33913413, rs3434
- the SNP information of the individual can be efficiently obtained by the above method, and the desired SNP information can be extracted for the relevant gene of the specific disease to be studied, and the typing result of the target gene can be obtained.
- the disease annotation can be performed by comparing with the existing database, and whether the corresponding SNP variation information causes the disease can be determined, and the embryo or individual corresponding to the single cell or the micro sample can be judged.
- the corresponding embryo or individual is judged to be a carrier of the Mendelian genetic disease gene.
- the present invention provides a system 1000 for determining whether an individual has an abnormal condition.
- the system 1000 includes: a sequencing library construction device 100, a sequencing device 200,
- the SNP determining device 300 and the abnormal state determining device 400 are identical to each other.
- the sequencing library construction device 100 is configured to construct a sequencing library for a nucleic acid sample of an individual.
- the sequencing device 200 is coupled to the sequencing library construction device 100, and thus, can be used to sequence the constructed sequencing library to obtain sequencing results composed of a plurality of sequencing data.
- SNP determining device 300 is coupled to the sequencing device for determining a known SNP contained in the sequencing result based on the obtained sequencing data.
- the abnormal state determining device 400 is connected to the SNP determining device, thereby for determining whether the individual has the known SNP based on the previously determined known SNPs included in the sequencing result. Related exception status.
- the method of determining whether an individual has an abnormal state described above can be effectively performed using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention, thereby efficiently determining the inclusion in the nucleic acid sample by sequencing.
- SNPs SNPs, and since these SNPs are related to abnormal states, it is thereby possible to effectively determine whether the source individuals of the nucleic acid samples have abnormal states associated with these SNPs.
- the system may further comprise a nucleic acid sample extraction device 101.
- the nucleic acid sample extraction device 101 is adapted to extract at least a portion of the individual's whole genome DNA from a single cell or a micro sample of the individual.
- the system 1000 can further comprise at least one single cell suitable for isolation from an individual selected from the group consisting of blood, tissue, urine, gametes, fertilized eggs, blastomeres, and embryos. A biological sample separation device for micro-samples.
- the efficiency of determining whether an individual has an abnormal state can be improved by effectively predicting and evaluating whether an individual has an abnormal state by a small sample from an individual, and the cost of determining whether the individual has an abnormal state is reduced.
- these samples can be easily obtained from organisms, and can be specifically sampled for certain diseases to take specific analytical measures for certain specific diseases.
- the biological sample separation device is adapted to separate at least one single cell selected from the group consisting of a dilution method, a mouth pipette separation method, a micromanipulation, a microdissection, a flow cytometry, and a microfluidic control. Or a small sample.
- the above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention.
- the single cells of the biological sample can be obtained efficiently and conveniently, so that the subsequent operations can be performed, whereby the single cells can be effectively separated from the individual, thereby further improving the efficiency of subsequently determining whether the individual has an abnormal state.
- the sequencing library construction device may further comprise a nucleic acid sample amplification unit adapted to amplify the nucleic acid sample to obtain a nucleic acid sample amplification product.
- the amplification unit is adapted to perform at least one selected from the group consisting of PEP-PCR, DOP-PCR, OmniPlex WGA, and MDA.
- the efficiency of amplifying whole genome DNA can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state.
- the above-described method of determining whether an individual has an abnormal state can be effectively implemented using a system for determining whether an individual has an abnormal state according to an embodiment of the present invention.
- the efficiency of amplifying whole genome DNA can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state.
- the operation mode of the "nucleic acid sample extraction device” is not particularly limited as long as the relevant nucleic acid sample can be obtained and the obtained nucleic acid sample is suitable for subsequent operations, for example, a whole genome from a single cell or a micro sample.
- DNA can be released by lysing single cells with lysate and collecting single-cell whole genome DNA.
- the sequencing library construction device may further include a screening unit.
- a nucleic acid probe is disposed in the screening unit to screen the nucleic acid sample amplification product with a nucleic acid probe to obtain a nucleic acid sample amplification product from a predetermined region; and to amplify a product for a nucleic acid sample from a predetermined region,
- the sequencing library was constructed. Thereby, the known SNPs included in the predetermined area can be effectively determined, whereby the efficiency of determining the abnormal state related to the SNPs according to the predetermined SNPs included in the predetermined area such as the exon area can be effectively improved. And accuracy.
- the nucleic acid probe may be provided in the form of a chip.
- the efficiency of screening by using a nucleic acid probe can be further improved, thereby further improving the efficiency of subsequently determining whether an individual has an abnormal state.
- the sequencing device is selected from Illumina Hiseq2000, Genome Analyzer. ⁇ At least one of a SOLiD sequencing system, an Ion Torrent, an Ion Proton, 454, a PacBio RS sequencing system, a Helicos tSMS sequencing device, and a nanopore sequencing device. Thereby, it is possible to utilize the high of these sequencing devices The characteristics of flux and deep sequencing further improve the efficiency of determining whether an individual has an abnormal state.
- the sequencing is performed using an Illumina Hiseq 2000, the sequencing data being 90 bp in length.
- the efficiency of subsequent SNP analysis can be further improved, thereby improving the efficiency of determining whether an individual has an abnormal state.
- the SNP determining apparatus further includes: a comparing unit configured to determine a known SNP included in the sequencing result by comparing the sequencing data with a reference gene .
- the comparison unit is adapted to perform alignment using SOAP/SOAP2 software. Thereby, the sequencing data included in the sequencing result can be effectively analyzed, so that the SNP included in the sequencing result can be effectively determined, and the efficiency of determining whether the individual has an abnormal state can be improved.
- the SNP determining apparatus may further include: a SNP filtering unit, wherein the SNP filtering unit is adapted to filter the known SNPs included in the sequencing result based on the following filtering conditions: SNP calling quality The value is greater than 20; the sequencing depth of the SNP site is greater than 8; the depth of the SNP site is less than 5 times the average depth of the genome; the copy number of the SNP site is not greater than 2; and the distance between the SNP site and the nearest other SNP site is greater than 5 .
- SNP calling quality The value is greater than 20; the sequencing depth of the SNP site is greater than 8; the depth of the SNP site is less than 5 times the average depth of the genome; the copy number of the SNP site is not greater than 2; and the distance between the SNP site and the nearest other SNP site is greater than 5 .
- the known SNP is located in the human chromosome HBB gene region.
- the known SNP is at least one selected from the group consisting of: rs33985472, rs63750954, rs63751128, rs33978907, rs34029390, rs34809925, rs33953406, rs33910569, rs33910569, rs33971634, rs3392539 rs33946267, rs35485099, rs36015961, rs33930977, rs35256489, rs33952266, rs33952266, rs33952266, rs33914668, rs33913413, rs33913413, rs34483965, rs34793594, rs35703285, rs
- the system for determining whether an individual has an abnormal state can effectively implement the foregoing A method of determining whether an individual has an abnormal state. Therefore, it can be effectively judged whether the subject has suffering from thalassemia, especially ⁇ -thalassemia.
- the term "connected,” as used herein, is to be understood broadly and can be either directly connected or indirectly connected, even using the same container or device, as long as functional linkages are possible, such as nucleic acid sample extraction and
- the nucleic acid sample amplification can be carried out in the same apparatus, that is, after the nucleic acid sample is extracted, the nucleic acid sample amplification processing can be performed in the same apparatus or container, and the extracted nucleic acid sample set does not need to be transported to other ones.
- Equipment or container as long as the conditions within the device (including the reaction conditions and the composition of the reaction system) are converted to be suitable for nucleic acid sample amplification reaction, thus achieving the functional connection between nucleic acid sample extraction and nucleic acid sample amplification , can also be considered to be covered by the term "connected".
- single-cell disease is detected on a single cell or a microsample using a method comprising the following steps:
- S3 sequencing sample preparation (library preparation);
- S7 SNP extraction and filtration to obtain the typing result of the target gene
- IVF-PGD in vitro fertilization-embryo preimplantation genetic diagnosis
- sperm and eggs are fertilized in vitro and cultured in vitro to the third day to form blastomeres at 5-8 cell stage.
- a conventional biopsy was performed. Under the micromanipulator, a blastomere single cell was taken out, placed in a PCR tube containing the lysate, and stored at -80 °C. The blastomere after biopsy continues to culture until the fifth day, reaching the embryonic stage, can be vitrified, or used directly for implantation.
- Single-cell whole-genome amplification was performed using Qiagen's REPLI-g Mini Kit kit according to the manufacturer's protocol.
- the blastomere single cells were first subjected to alkaline lysis, and then the amplification reaction solution was added for constant temperature expansion at 30 °C. increase.
- a DNA sequencing library was constructed using the Illumina Paired-End DNA Sample Prep Kit according to the manufacturer's protocol. Three libraries were prepared for the blastomere single-cell whole genome amplification products obtained in the previous section. The expected inserts of the library were 200 bp, 350 bp, and 500 bp, respectively. The actual insert size is shown in Table 1.
- High throughput sequencing was performed using the Illumina Hiseq 2000 sequencing system.
- a strategy for whole genome sequencing of blastomere single cells is employed.
- the well-prepared library of blastomere single-cell amplification products was prepared by cBot, and then run on the Hiseq2000 sequencer.
- the sequencing length was 90 bp, and the Pair End was sequenced in two directions. One lane was measured for each library, and three lanes were measured.
- Sequencing data were aligned using SOAP2 software and referenced to the human reference genome sequence (Hgl9; NCBI Build37), allowing up to two base mismatches when aligned.
- the raw data quality, GC content, actual insert fragment size, alignment rate, repetition rate, and genomic coverage and sequencing depth were calculated based on the alignment results. See Table 1 for specific information.
- the quality of the sequencing data is controlled by these statistical results. In this example, we obtained a total of 38.25X of the whole genome, and the statistical results of the data were all up to standard. After that, the data of the upper reference genome and the repeated data are not removed, and a valid data set is obtained for SNP analysis.
- Sample PE reads PE Unique Coverage Mean Depth Duplication Rate
- BLSl is the number of single-cell samples of blastomeres
- BLSl(total) is the statistical result of the combination of three lane data of BLSl sample
- Q20 (%) is the ratio of the data of quality value above 20 to the total amount of data.
- GC (%) indicates the actual GC content percentage of the sequencing data
- Insert Size indicates the actual library insert size of the sequencing data
- Clean Reads indicates the amount of read data remaining after the low-quality read is removed
- PE-alignment indicates both ends of the Pair End. Both can read the ratio of the read data of the reference genome to the total data volume
- PE reads that both ends of the Pair End can compare the read data of the reference genome
- PE Unique means that both ends of the Pair End can be uniquely aligned to the reference genome.
- the amount of read data Coverage represents the coverage of the whole genome
- Mean Depth represents the average depth of the whole genome
- Duplication Rate represents the ratio of repeated read data to total read data.
- This embodiment uses the SOAPsnp software to perform SNP Calling on the valid data sets obtained above according to the instructions of SOAPsnp (see http://soa.genomics.org.cn/soapsn, html, which is incorporated herein by reference). Finally, the SNP data set is obtained.
- the SNP data set obtained above is filtered, and the filtering conditions are as follows:
- SNP calling quality value is greater than 20;
- the sequencing depth of the site is greater than 8.
- the depth of the site is less than 5 times the average depth of the genome
- the copy number of the site is not more than 2;
- the distance between the SNP and the nearest SNP is greater than 5.
- the filtered SNP data extracts SNPs located in the target gene region for the disease-related genes to be detected.
- the gene for ⁇ -thalassemia is detected to detect whether the corresponding blastomere embryo has ⁇ -sea aquaemia, or whether it is a carrier of the ⁇ thalassemia disease gene. Therefore, in this example, the SNP locus located in the ⁇ gene region of chromosome 11 was extracted, and the specific information is shown in Table 2.
- ⁇ Chromosome indicates the chromosome number
- Locus indicates the site number of the base corresponding to the SNP on the chromosome
- Ref indicates the base type of the corresponding site on the human reference genome in the database
- Blastomere indicates the corresponding SNP position in the blastomere single cell data.
- the type information of the point Mutation indicates the mutation type of the corresponding site existing in the database
- the SNP ID indicates the ID number of the SNP site in the database
- Gene indicates which gene region the SNP site is located in.
- the disease annotation is further made based on the SNP information of the target gene filtered and extracted as described above.
- heterozygous SNP variants were found in the 5247141 and 5247791 bases of chromosome 11, respectively, which are located in the HBB gene region. Homozygous mutations at these two sites can cause beta thalassemia.
- the two sites are heterozygous mutations, indicating that the blastomeres corresponding to the single cells are carriers of the ⁇ -thalassemia pathogenic gene.
- the blood sample of this example is derived from a human individual with a normal phenotype. A small amount of blood sample was taken and centrifuged to separate the leukocyte layer. The leukocytes were washed with PBS, suspended in PBS droplets, and the individual leukocytes were separated by a mouth pipe, placed in 1-2 ⁇ l of alkaline cell lysate, and frozen at -80 ° C for more than 30 min.
- Qiagen's REPLI-g Mini Kit kit was used. According to the manufacturer's protocol, blood single cells were subjected to alkaline lysis after treatment at 65 °C for 10 minutes, and then amplified reaction solution was added for constant temperature amplification at 30 °C.
- the Agilent chip used in this example targets a target area of 2.1 M in size, including all exon regions of one hundred single gene disease-associated genes.
- the captured and constructed sequencing libraries were subjected to high throughput sequencing.
- This example uses the Illumina Hiseq 2000 sequencing system for high throughput sequencing.
- a strategy for chip capture sequencing of blood single cells is adopted, and the target region is all exon regions of one hundred single gene disease-related genes, about 2.1 M in size.
- a well-prepared chip capture library prepared by blood single-cell amplification products was prepared by cBot, and then run on a Hiseq2000 sequencer.
- the sequencing length was 90 bp, and Pair End was bidirectionally sequenced.
- the amount of sequencing data was expected to be 1 to 2 G bases.
- Sequencing data were aligned using SOAP2 software and referenced to the human reference genome sequence (Hgl9; NCBI Build37), allowing up to two base mismatches when aligned.
- the raw data quality, GC content, alignment rate, repetition rate, and genomic coverage and sequencing depth were calculated based on the alignment results.
- the specific information is shown in Table 3.
- the quality of the measured data is controlled by these statistical results. In this embodiment, we obtain a total amount of data of 457X in the target area, and the statistical results of the data can reach the standard. After that, the data of the upper reference genome and the repeated data are not removed, and a valid data set is obtained for SNP analysis.
- GC (%) indicates the actual GC content percentage of the sequencing data
- Reads indicates the amount of read data remaining after the read of the low quality value is removed
- Production indicates the amount of base data calculated according to the Reads value
- PE-alignment indicates the both ends of the Pair End Both can compare the ratio of the read data of the reference genome to the total data volume
- PE Unique indicates that the pair of Pair ends can uniquely compare the read data to the reference genome
- Coverage indicates the coverage of the whole genome
- Mean Depth indicates the whole genome.
- Duplication Rate indicates the ratio of the repeated read data to the total read data
- the specificity (Reads) indicates the ratio of the read data amount of the target area to the total read data.
- the specificity (Bases) indicates the specific alignment. The ratio of the amount of base data in the target area to the total amount of base data.
- This embodiment uses the SOAPsnp software according to the instructions of SOAPsnp (see http://soap.genoniics, org m''soapsnp.htrnl, which is incorporated herein by reference) for the valid data set obtained above. Perform SNP Calling and finally obtain the SNP data set.
- the SNP data set obtained above is filtered, and the filtering conditions are as follows:
- SNP calling quality value is greater than 20;
- the sequencing depth of the site is greater than 8.
- the depth of the site is less than 5 times the average depth of the target region
- the copy number of the site is not more than 2;
- the distance between the SNP and the nearest SNP is greater than 5.
- the filtered SNP data extracts SNPs located in the target gene region for the disease-related genes to be detected.
- the gene for ⁇ -thalassemia is detected to detect whether the corresponding blastomere embryo has ⁇ -sea aquaemia, or whether it is a carrier of the ⁇ thalassemia disease gene. Therefore, in this example, the SNP locus located in the ⁇ gene region of chromosome 11 was extracted, and the specific information is shown in Table 4.
- Chromosome indicates the chromosome number
- Locus indicates the site number of the base corresponding to the SNP on the chromosome
- Ref indicates the base type of the corresponding site on the human reference genome in the database
- Blood Cell indicates the corresponding SNP site in the blood single cell data.
- the type information, Mutation indicates the type of mutation of the corresponding site existing in the database
- the SNP ID indicates the ID number of the SNP site in the database
- Gene indicates which gene region the SNP site is located in.
- the SNP information of the target gene filtered and extracted above is further subjected to disease annotation.
- no heterozygous or homozygous mutation site was detected in the HBB gene region, indicating that the individual corresponding to the blood single cell in the present example is not a beta thalassemia disease patient, nor is it a beta thalassemia disease gene carrier.
- the beta thalassemia disease gene region is a normal genotype.
- embodiments of the present invention enable a method for detecting Mendelian genetic diseases (single-gene diseases) on single-cell or micro-samples based on high-throughput sequencing.
- the invention can be effectively applied to the construction and sequencing of a DNA sequencing library of sample DNA, and the obtained library has good quality and accurate sequencing results.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Wood Science & Technology (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Genetics & Genomics (AREA)
- Engineering & Computer Science (AREA)
- Pathology (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Physics & Mathematics (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
La présente invention concerne un procédé et un système qui permettent de déterminer si une personne est dans un état anormal. Le procédé permettant de déterminer si la personne est dans un état anormal consiste : à établir une bibliothèque de séquençage pour un échantillon d'acide nucléique de la personne ; à séquencer la bibliothèque de séquençage afin d'obtenir un résultat de séquençage, le résultat de séquençage étant constitué de plusieurs éléments de données de séquençage ; à déterminer, en fonction des données de séquençage, un polymorphisme nucléotidique (SNP) connu, compris dans le résultat de séquençage ; à déterminer, en fonction du SNP connu, si la personne est dans un état anormal lié au SNP connu.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2012/080500 WO2014029093A1 (fr) | 2012-08-23 | 2012-08-23 | Procédé et système permettant de déterminer si une personne est dans un état anormal |
HK15109589.1A HK1208889A1 (en) | 2012-08-23 | 2012-08-23 | Method and system for determining whether individual is in abnormal state |
CN201280074982.8A CN104508141A (zh) | 2012-08-23 | 2012-08-23 | 确定个体是否患有异常状态的方法及系统 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2012/080500 WO2014029093A1 (fr) | 2012-08-23 | 2012-08-23 | Procédé et système permettant de déterminer si une personne est dans un état anormal |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014029093A1 true WO2014029093A1 (fr) | 2014-02-27 |
Family
ID=50149356
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2012/080500 WO2014029093A1 (fr) | 2012-08-23 | 2012-08-23 | Procédé et système permettant de déterminer si une personne est dans un état anormal |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN104508141A (fr) |
HK (1) | HK1208889A1 (fr) |
WO (1) | WO2014029093A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018133547A1 (fr) * | 2017-01-19 | 2018-07-26 | 人和未来生物科技(长沙)有限公司 | PROCÉDÉ DE CONSTRUCTION D'UNE BANQUE SERVANT À LA DÉTECTION PRÉNATALE NON INVASIVE DES MUTATIONS DE GÈNE DE LA β-THALASSÉMIE FŒTALE, PROCÉDÉ DE DÉTECTION ET KIT |
WO2018133546A1 (fr) * | 2017-01-19 | 2018-07-26 | 人和未来生物科技(长沙)有限公司 | Procédé de construction, procédé de détection et kit pour une bibliothèque de détection de mutation de gène de thalassémie alpha fœtale prénatale non invasive |
CN110093413A (zh) * | 2019-04-09 | 2019-08-06 | 深圳市卫生健康发展研究中心 | 检测β地中海贫血的引物组和试剂盒 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101374963A (zh) * | 2005-12-22 | 2009-02-25 | 凯津公司 | 用于基于aflp的高通量多态性检测的方法 |
-
2012
- 2012-08-23 CN CN201280074982.8A patent/CN104508141A/zh active Pending
- 2012-08-23 WO PCT/CN2012/080500 patent/WO2014029093A1/fr active Application Filing
- 2012-08-23 HK HK15109589.1A patent/HK1208889A1/xx unknown
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101374963A (zh) * | 2005-12-22 | 2009-02-25 | 凯津公司 | 用于基于aflp的高通量多态性检测的方法 |
Non-Patent Citations (3)
Title |
---|
LAM, K.W.G. ET AL.: "Noninvasive Prenatal Diagnosis of Monogenic Diseases by Targeted Massively Parallel Sequencing of Maternal Plasma: Application to beta Thalassemia.", CLINICAL CHEMISTRY, vol. 58, no. 10, 15 August 2012 (2012-08-15), pages 1 - 9 * |
SABATH, D.E. ET AL.: "A Multiplex Approach to the Molecular Diagnosis of B-Thalassemia.", THE JOURNAL OF MOLECULAR DIAGNOSTICS., vol. 13, no. 4, 2011, pages 369 - 370 * |
WEI, XIAOMING ET AL.: "Identification of Sequence Variants in Genetic Disease-Causing Genes Using Targeted Next-Generation Sequencing.", PLOS ONE., vol. 6, no. 12, 21 December 2011 (2011-12-21), pages 1 - 10 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018133547A1 (fr) * | 2017-01-19 | 2018-07-26 | 人和未来生物科技(长沙)有限公司 | PROCÉDÉ DE CONSTRUCTION D'UNE BANQUE SERVANT À LA DÉTECTION PRÉNATALE NON INVASIVE DES MUTATIONS DE GÈNE DE LA β-THALASSÉMIE FŒTALE, PROCÉDÉ DE DÉTECTION ET KIT |
WO2018133546A1 (fr) * | 2017-01-19 | 2018-07-26 | 人和未来生物科技(长沙)有限公司 | Procédé de construction, procédé de détection et kit pour une bibliothèque de détection de mutation de gène de thalassémie alpha fœtale prénatale non invasive |
CN110093413A (zh) * | 2019-04-09 | 2019-08-06 | 深圳市卫生健康发展研究中心 | 检测β地中海贫血的引物组和试剂盒 |
Also Published As
Publication number | Publication date |
---|---|
HK1208889A1 (en) | 2016-03-18 |
CN104508141A (zh) | 2015-04-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12351879B2 (en) | Enrichment of circulating tumor DNA | |
Hu et al. | Mutation screening in 86 known X-linked mental retardation genes by droplet-based multiplex PCR and massive parallel sequencing | |
Tsai et al. | Amplification-free, CRISPR-Cas9 targeted enrichment and SMRT sequencing of repeat-expansion disease causative genomic regions | |
KR101850437B1 (ko) | 차세대 염기서열 분석기법을 이용한 장기 이식 거부 반응 예측 방법 | |
US9624490B2 (en) | Multiplexed sequential ligation-based detection of genetic variants | |
CN103764841B (zh) | 确定单细胞染色体非整倍性的方法和系统 | |
Hahn et al. | Recent progress in non-invasive prenatal diagnosis | |
WO2013052557A2 (fr) | Procédés pour diagnostic génétique préimplantatoire par séquençage | |
JP2014507164A (ja) | ハプロタイプ決定のための方法およびシステム | |
CN104884633B (zh) | 通过测序少量遗传物质的高通量基因分型 | |
WO2013130848A1 (fr) | Analyse améliorée par informatique d'échantillons de fœtus soumis à une contamination maternelle | |
CN103608466A (zh) | 非侵入性产前亲子鉴定方法 | |
JP2008526247A5 (fr) | ||
WO2013086744A1 (fr) | Procédé et système pour déterminer si un génome est anormal | |
WO2017193044A1 (fr) | Diagnostic prénatal non effractif | |
WO2014028778A1 (fr) | Procédés et compositions pour la réduction de la contamination d'une banque génétique | |
CA3176541A1 (fr) | Preparation d'echantillon en une seule etape pour sequencage de nouvelle generation | |
WO2024076469A1 (fr) | Procédés non invasifs d'évaluation du rejet de greffe chez les receveuses de greffe enceintes | |
US20200232033A1 (en) | Platform independent haplotype identification and use in ultrasensitive dna detection | |
WO2014029093A1 (fr) | Procédé et système permettant de déterminer si une personne est dans un état anormal | |
CN106636435A (zh) | 利用hrm和焦磷酸测序在单细胞中进行遗传检测的方法 | |
CN114787385A (zh) | 用于检测核酸修饰的方法和系统 | |
CN111363804A (zh) | Joubert综合征的检测方法、检测组合物及检测试剂盒 | |
Plongthongkum | Probing Interaction of Genome and Methylome by Targeted Bisulfite Sequencing | |
HK1189636A (en) | Defining diagnostic and therapeutic targets of conserved free floating fetal dna in maternal circulating blood |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12883246 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 02/07/2015) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12883246 Country of ref document: EP Kind code of ref document: A1 |