WO2014183270A1 - 一种检测染色体结构异常的方法及装置 - Google Patents
一种检测染色体结构异常的方法及装置 Download PDFInfo
- Publication number
- WO2014183270A1 WO2014183270A1 PCT/CN2013/075622 CN2013075622W WO2014183270A1 WO 2014183270 A1 WO2014183270 A1 WO 2014183270A1 CN 2013075622 W CN2013075622 W CN 2013075622W WO 2014183270 A1 WO2014183270 A1 WO 2014183270A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- read length
- read
- cluster
- pair
- clusters
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000005856 abnormality Effects 0.000 title claims abstract description 32
- 230000002759 chromosomal effect Effects 0.000 title claims abstract description 5
- 210000000349 chromosome Anatomy 0.000 claims abstract description 69
- 238000012163 sequencing technique Methods 0.000 claims abstract description 45
- 239000012634 fragment Substances 0.000 claims abstract description 34
- 238000001914 filtration Methods 0.000 claims abstract description 21
- 230000002159 abnormal effect Effects 0.000 claims abstract description 16
- 238000012070 whole genome sequencing analysis Methods 0.000 claims abstract description 8
- 238000003860 storage Methods 0.000 claims description 13
- 210000001726 chromosome structure Anatomy 0.000 claims description 11
- 230000003252 repetitive effect Effects 0.000 claims description 7
- 238000005056 compaction Methods 0.000 claims description 6
- 208000034951 Genetic Translocation Diseases 0.000 claims description 4
- 238000012217 deletion Methods 0.000 claims description 4
- 230000037430 deletion Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 206010061764 Chromosomal deletion Diseases 0.000 claims description 2
- 230000005945 translocation Effects 0.000 abstract description 18
- 238000004458 analytical method Methods 0.000 description 18
- 239000000523 sample Substances 0.000 description 13
- 238000010586 diagram Methods 0.000 description 7
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 4
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 4
- 238000007480 sanger sequencing Methods 0.000 description 4
- 238000011529 RT qPCR Methods 0.000 description 3
- 101150068479 chrb gene Proteins 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000000344 soap Substances 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 208000002330 Congenital Heart Defects Diseases 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 208000028831 congenital heart disease Diseases 0.000 description 2
- 229940104302 cytosine Drugs 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 210000000245 forearm Anatomy 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 238000007901 in situ hybridization Methods 0.000 description 2
- 238000012067 mathematical method Methods 0.000 description 2
- 239000002773 nucleotide Substances 0.000 description 2
- 125000003729 nucleotide group Chemical group 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 108091034117 Oligonucleotide Proteins 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 210000002230 centromere Anatomy 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011109 contamination Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000002493 microarray Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000011895 specific detection Methods 0.000 description 1
- 208000011580 syndromic disease Diseases 0.000 description 1
- 210000003411 telomere Anatomy 0.000 description 1
- 102000055501 telomere Human genes 0.000 description 1
- 108091035539 telomere Proteins 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Definitions
- the invention relates to the technical field of genomics and bioinformatics, in particular to a method and a device for detecting abnormalities of chromosome structure.
- Karyotype analysis For example, G-band karyotype analysis, because the distribution of 400 to 600 BAND is used to judge the abnormality of chromosome structure, it is usually only possible to detect abnormalities at the chromosome level. In the best case, deletions and duplications above 5 Mbp can be detected. For the detection of smaller fragments ( ⁇ 5M), there is nothing to do. Moreover, this method requires the cultivation of living cells, requiring the cells to remain active.
- Fluorescence in situ hybridization FISH, fluorescence in situ Hybridization method: deletions, repeats, and balanced translocations of smaller fragments can be detected, but the detected chromosome fragments need to be predetermined to prepare the corresponding probes, and thus are limited by probe design. Because FISH cannot detect unknown areas, it is often used to verify test results.
- Microarray method including two probe methods, one based on single nucleotide polymorphism (SNP, single) Nucleotide polymorphisms), a CNV-based design, have similar limitations to FISH.
- SNP single nucleotide polymorphism
- CNV-based design a CNV-based design
- a method for detecting an abnormality in a chromosome structure includes the steps of: obtaining a whole genome sequencing result of a target individual, including a plurality of pairs of read length pairs, each pair of read length pairs consisting of two read length sequences, Located at both ends of the measured chromosome fragment, each pair of read pairs are from the positive and negative strands of the corresponding chromosome fragment, or each pair of read pairs are from the positive or negative strand of the corresponding chromosome fragment;
- the reference sequence is compared to obtain an abnormal matching set, and the abnormal matching set includes a first type of read length pair that meets the following description, and the two read length sequences of the first type of read long pair are respectively matched to different chromosomes of the reference sequence;
- the matched position clusters the read length sequences in the abnormal matching set into clusters, each cluster contains a single-ended read length sequence from a set of read long pairs, and the corresponding read length sequence of the other end is located in another cluster;
- an apparatus for detecting an abnormality of a chromosome structure includes: a data input unit for inputting data; a data output unit for outputting data; and a storage unit for storing data including executable And a processor, coupled to the data input unit, the data output unit, and the storage unit for executing an executable program stored in the storage unit, the executing of the program comprising performing the foregoing method for detecting an abnormality of the chromosome structure.
- a computer readable storage medium for storing a program for execution by a computer is provided, and those skilled in the art can understand that when the program is executed, the above-mentioned detection of chromosome structure abnormality can be completed by instructing related hardware. All or part of the steps of the method.
- the storage medium may include: a read only memory, a random access memory, a magnetic disk or an optical disk, and the like.
- the method according to the present invention obtains a pair of read lengths matched to different chromosomes by alignment of the whole genome sequencing result with the reference sequence, thereby enabling screening of chromosomal translocation structural abnormalities, and further improving the obtained results by clustering and filtering. Sex and reliability make it possible to obtain analytically meaningful results.
- FIG. 1 is a schematic diagram of a pair of Reads obtained by double-end sequencing according to an embodiment of the present invention
- FIG. 2 is a schematic diagram of a first type of Reads of anomaly matching according to an embodiment of the present invention
- FIG. 3 is a schematic diagram of a second type of Reads of anomaly matching according to an embodiment of the present invention.
- FIG. 4 is a schematic diagram of a third type of Reads for abnormal matching according to an embodiment of the present invention.
- Figure 5 is a schematic illustration of a pair of clusters located on different chromosomes in accordance with an embodiment of the present invention
- FIG. 6 is a schematic diagram of RPK of "FA” in Experimental Example 1 according to an embodiment of the present invention.
- Fig. 7 is a schematic diagram showing the RPK of "SON" in Experimental Example 1 according to an embodiment of the present invention.
- a method for detecting an abnormality in a chromosome structure comprising the steps of:
- Step1 Obtain the whole genome sequencing results of the target individual.
- the sequencing results include paired read length pairs (also called “reads”) Reads, each pair of Reads consisting of two read length sequences located at the ends of the measured chromosome fragments, each pair of Reads from the positive strand of the corresponding chromosome fragment And the negative strand, or, each pair of Reads comes from both the positive or negative strand of the corresponding chromosome fragment.
- paired read length pairs also called "reads” Reads
- the measured chromosome fragment is usually obtained by interrupting the chromosome sample from the target individual, and the corresponding library preparation is performed according to the selected sequencing method.
- the optional sequencing method is based on the sequencing platform from but not limited to CG. (Complete Genomics), Illumina/ Solexa, ABI/SOLiD and Roche 454, preparation of a single-ended or double-ended sequencing library based on the selected sequencing platform.
- double-end sequencing can be performed, and the two read length sequences Read1 and Read2 in each pair of Reads obtained are respectively derived from the positive-strand Sp and the negative-strand Sm of the corresponding chromosome segment, as shown in FIG.
- the length L-r1 and the length L-r2 of Read2 may be the same or different.
- a single-read sequencing method is used to completely obtain the sequence of the entire chromosome segment, it is also feasible to intercept a sequence of an appropriate length from both ends of the completely obtained sequence to form a pair of Reads.
- the two read length sequences Read in each pair of Reads are simultaneously from the positive or negative strand of the corresponding chromosome fragment. This embodiment does not limit the specific sequencing method selected.
- the size of the library used for sequencing is referred to as L-lib
- a library in which L-lib is 100 to 1000 bp is generally referred to as a small fragment library
- L-lib is 2K, 5K-6K, 10K, 20K
- a 40 Kbp library is called a large fragment library.
- the present invention does not require the size of L-lib, but in general, a library of a larger length is advantageous for obtaining an effective result under the premise of ensuring the quality of library construction. Therefore, L-lib ⁇ 300 bp is preferred. Large fragments, such as 5 Kbp libraries, or small fragments, such as a 500 bp library, can generally be used.
- the sequencing depth of the large fragment library can be selected to be greater than 2 times, and the sequencing depth of the small fragment library can be selected to be greater than 5 times.
- the sequencing depth of the large fragment library is preferably 2 Multiplication, the sequencing depth of the small fragment library is preferably 5 times.
- L-r1 and L-r2 are preferably 25 bp or more, because if it is less than 25 bp, the unique aligning ratio is lowered, and the complexity of subsequent obtaining alignment results is increased. L-r1 and L-r2 also need not be too large to avoid wasting data, so it is preferably 50 bp. L-r1 and L-r2 have no maximum value and can be changed according to the development of sequencing technology. For example, according to current sequencing technology, L-r1 and L-r2 generally do not exceed 150 bp.
- Step2. Align the sequencing results with the reference sequence.
- the reference sequence used is a known sequence and may be any reference template in the biological category to which the target individual belongs in advance.
- the reference sequence can be selected from the National Center for Biotechnology Information (NCBI, national). Center for biotechnology Information) provided by HG19.
- NCBI National Center for Biotechnology Information
- a resource library containing more reference sequences may be pre-configured, and a closer reference sequence may be selected according to factors such as gender, race, and region of the target individual before the sequence comparison, to help obtain more accurate. Test results.
- a pair of Reads is allowed to have at most n base mismatches, n is preferably 1 or 2.
- Normal match set *.pair This includes Reads that conform to the description below, that is, the two read length sequences Read1 and Read2 in Reads match the same chromosome of the reference sequence, and the positive and negative chains of the matched position The relationship is consistent with the positive and negative chain relationship in Reads, and the deviation of the length L-pr and L-lib of the chromosome segment calculated from the matched position is less than the preset threshold V-lib.
- V-lib is preferably 5% x L-lib ⁇ 15% x L-lib, further preferably 10% x L-lib.
- the above thresholds are set empirically according to the standard deviation of the library size.
- the standard deviation of small fragment libraries is about 15 bp, and the standard deviation of large fragment libraries is about 50 bp. It can be considered that the deviation of L-pr and L-lib is within the range of 3 standard deviations, for example, for 500 bp.
- the library can be considered to have a suitable range of 455 bp to 545 bp for L-pr.
- the number of Reads can be obtained according to the number of positions matched. For example, the number of Reads included in the unit length can be counted.
- the unit length can be set according to L-lib. For example, it can be set to 1.5 to 4 times L. -lib. If L-lib is 500 bp, the unit length can be set to 1 Kbp, and the RPU can be recorded as RPK.
- V-rm is 10 to 30%, and more preferably 20%.
- the average value of the RPU can be obtained by statistics or by estimation.
- the average value of the RPU can be estimated by the following method: sequencing depth ⁇ (unit length / L-lib). If you do not need to use RPU, you do not need to get *.pair.
- the two read length sequences in the first type of Reads are matched to different chromosomes of the reference sequence, respectively; such Reads are associated with translocation structural anomalies, such as balanced translocations and unbalanced translocations.
- a case of balanced translocation is shown.
- Read1 in a pair of Reads matches the chromosome chra
- Read2 matches the chromosome chrb
- the connection between Read1 and Read2 is shown.
- the dotted line indicates their positional relationship in the chromosome segment (the same below), and pa and pb respectively indicate the position of the possible breakpoint.
- breakpoint refers to the boundary point where the chromosome is structurally abnormal.
- the two read length sequences in the second type of Reads match the same chromosome of the reference sequence, but L-pr is negative; such Reads are associated with repetitive structural anomalies in tandem.
- both Read1 and Read2 in a pair of Reads match the chromosome chra, but the head-to-tail position relationship of the matched position is opposite to the head-to-tail position relationship of Read1 and Read2 in the chromosome segment, respectively, pa1 and pa2 indicate possible existence.
- the starting and ending position of the repeated segment, L-sv indicates the length of the repeated segment, and the dotted line in the middle of the chra in the figure indicates the length of the omission (the same below).
- the two read length sequences in the third type of Reads match the same chromosome of the reference sequence, but L-pr is greater than L-lib and the deviation exceeds the preset threshold V-lib; such Reads are associated with missing structural anomalies.
- both Read1 and Read2 in a pair of Reads match the chromosome chra, and the head-to-tail position relationship of the matched position is the same as the head-to-tail position relationship of Read1 and Read2 in the chromosome segment, but the distance exceeds the suitable range
- pa1 And pa2 respectively indicate the start and end positions of the missing fragments that may exist
- L-sv indicates the length of the missing fragments.
- the exception matching set is not limited to including the above three types of Reads, as long as it does not belong to the normal matching set, but can match the read sequence of Reads or Reads in the reference sequence, can be counted into the abnormal matching set.
- One of ordinary skill in the art can associate different types of abnormally matched expressions with corresponding chromosomal structural anomalies that may occur.
- the case of distinguishing positive or negative chain matching or mismatch may not be considered in the abnormal matching set.
- Unable to match set *.unmap This includes Read that cannot be matched to the reference sequence. These Reads can be paired (both cannot match) or single-ended (the other Read can match).
- the single-ended Read that exists in *.unmap can be used to further breakpoint assembly after obtaining the result cluster to obtain a more accurate breakpoint range. If you do not need to breakpoint assembly, you don't have to get *.unmap.
- Step3 Cluster the read length sequences in *.sin into clusters according to the matched positions.
- the clustering can adopt various clustering algorithms, which is not limited in this embodiment.
- a simple method is to divide the cluster according to the set minimum inter-cluster distance V-cl, that is, to search the read-length sequence Read sorted by position, starting from the first Read, if the second Read is If the distance between them is less than V-cl, they are divided into the same cluster, and the search is continued from the second Read until the distance between the nth read and the n-1th Read is greater than V-cl.
- the n pieces of Read start to be divided into the second cluster, and the foregoing process is executed cyclically until all Reads are traversed.
- clustering it is not necessary to consider the case of positive and negative chains separately, and clustering according to the position of Read matching on the chromosome.
- Each cluster after clustering contains a single-ended read length sequence from a set of Reads, and the corresponding read length sequence at the other end is located in another cluster, so these two clusters can be referred to as a pair of clusters.
- FIG. 5 which are schematic diagrams of a pair of clusters cluster1 and cluster2 located on different chromosomes, of course, the paired clusters may also be located on the same chromosome.
- each cluster preferably contains more than two Reads. If a single Read is more than V-cl from both before and after Read, the abnormal data can be discarded.
- V-cl is not lower than L-lib. If the setting is too low, the number of candidate clusters will be too large, and the number of Reads in the cluster will be too small, which is not convenient for later screening and filtering, and may also lead to an increase in false positive results. If the setting is too high, it may be inconvenient to determine the breakpoint, and the range of the breakpoint is increased. Therefore, it may preferably be 10 Kbp.
- V-cl can have different specific meanings, such as the distance between the centers of gravity of two adjacent clusters, or the closest two clusters. The distance between the two Reads and so on.
- Step4. Filter the clusters obtained by clustering.
- Filtration is to remove as much as possible of possible interference, such as sample contamination, sequencing errors, comparison errors, noise, etc., so that the results can reflect the true chromosome structure anomaly as much as possible, so it can be set according to actual needs and possible types of interference.
- Filtering conditions the present embodiment preferably provides the following filtering methods. In practical applications, one or several filtering methods may be used in combination or separately:
- the degree of compactness of the cluster calculates the degree of compactness of each cluster, and filter out the clusters whose degree of compaction does not satisfy the preset requirement R-va and the clusters paired with them.
- Various available mathematical methods can be used to calculate the degree of compactness of each cluster.
- the degree of compaction can be expressed by the variance, and the variance of the position of each Read in the cluster and the center or center of gravity of the cluster can be calculated. The smaller the variance, the tighter the degree The higher.
- the length of the read length sequence in the range of 5% to 25% of the length of both ends of the cluster may be discarded, preferably 20%, to reduce the influence of the peripheral data on the calculation result.
- R-va may be set to a fixed threshold, for example, the required variance is lower than a fixed threshold, or set to a elimination ratio, for example, the ranking of the required variance in all clusters is within a preset minimum interval, for example, R-va is set to The ranking of the variance in all clusters is in the lowest interval of 2% to 10%, preferably 5%.
- the degree of compactness of the cluster reflects the stability of the Read distribution, indicating whether Read is concentrated in a small interval.
- the actual structural variation will be submerged in numerous “environmental noise", but “environmental noise.”
- the effect on the whole genome is basically uniform, so there is a tendency to show a basic average distribution in the whole sequence (of course, it may also be affected by, for example, GC (guanine Guanine and cytosine Cytosine) content), but in reality Where the structural variation occurs, the Read in the cluster usually exhibits a trend similar to a normal distribution, so the degree of compactness, such as variance, can well reflect the differences between clusters.
- (B) According to the linear correlation of the paired clusters: Calculate the linear correlation of the two pairs of pairs, and filter out the paired clusters whose linear correlation does not satisfy the preset requirement R-li.
- Various available mathematical methods can be used to calculate the linear correlation of a pair of clusters, such as calculating the correlation coefficients of two clusters, and the higher the correlation coefficient, the higher the linear correlation.
- R-li may be set to a fixed threshold, for example, the correlation coefficient is required to be higher than a fixed threshold, or set to a phase-out ratio, for example, the ranking of the correlation coefficient in all clusters is required to be within a preset maximum interval, for example, R-li It is set that the ranking of the correlation coefficient in all clusters is within the highest range of 2% to 10%, preferably 5%.
- the linear correlation pays more attention to the consistency of the Reads distribution in the paired clusters, that is, whether the distribution trends at both ends of Reads are basically the same, so the linear correlation can better reflect the distribution inside the paired clusters.
- the degree of compaction of the clusters such as the variance
- the linear correlation of the clusters to filter the candidate clusters can achieve good results.
- a control set according to a normal sample the paired cluster is compared with a preset control set containing a plurality of normal samples, and the paired clusters whose number of hit normal samples reaches a preset threshold value V-con are filtered out.
- a normal sample refers to a collection of result clusters obtained by an analysis process such as "alignment-cluster-filtering" with other normal individuals of the same biological species as the target individual. For ease of alignment, all of the Reads in the cluster can be merged into one, and the paired clusters produce a pair of fused value pairs (similar to a pair of Reads), using the fused value pairs for comparison.
- the frequency of occurrence of the result cluster in a normal individual can be obtained. If the frequency of occurrence of a result cluster is high, it may indicate that the result cluster may be due to sample nature, experimental process, sequencing process or environment. What is caused by noise or the like does not mean that such structural variation has occurred in the sample itself.
- Such a result cluster is a common false positive result obtained by the same method analysis of different samples and should be removed. Therefore, filtering the clusters using the control set can further reduce the probability of false positives and help to obtain real structural variation analysis results.
- V-con can be determined according to the establishment manner and characteristics of the normal sample, for example, the ratio of the V-con to the normal sample number in the control set may be 3%-10%, preferably 5%-6%, for example, if the control set contains 90% For a normal sample, 5 hits can be considered as reaching the threshold.
- auxiliary parameters include various parameters that help to further confirm, distinguish the type of structural anomaly, or help to understand the details of the structural anomaly. For example, the number of mismatch generated in the comparison process, the number of Reads supporting the cluster, the RPU value of the relevant region obtained based on *.pair, whether the cluster is located in the N region, or the like.
- the use of auxiliary parameters may include two methods. One is as a filtering condition, the filtering requirements related to the auxiliary parameters are set, the clusters that do not meet the requirements are directly filtered out, and the other is used as a reference for the auxiliary judgment, and the auxiliary parameters are accompanied by the results.
- the clusters are provided together and judged by means of manual analysis. Therefore, the content of this section can be applied to Step 4 (for filtering), and can also be applied to the next step Step 5 (for assisting manual analysis).
- the specific use of the method is not limited.
- the following sections list some auxiliary parameters and their relationship with the result analysis. In actual use, they can be set as filtering conditions according to the following description, or as auxiliary judgment basis for manual analysis. Different auxiliary parameters can be used in combination or separately. Use alone.
- mismatch number The average mismatch number of Reads in a paired cluster is generally no more than one or two, that is, one or two mismatches are allowed for each pair of Reads, preferably no more than one. If the matching requirement is set according to this setting, it is not necessary to consider the parameter again. If the setting is relatively loose, for example, if two mismatches are allowed, the result cluster can be filtered again according to the parameter. Or judge, for example, to set an average of only one mismatch.
- the number of Reads supporting the cluster that is, the number of Reads included in the paired cluster.
- the "L-lib impact range of breakpoints" is usually larger than the sum of the spans of the paired clusters.
- the range of influence of L-lib on breakpoints is generally fluctuated by 2 times L-lib. For example, between 1 and 4 times L-lib, when it is specifically set, it can be appropriately relaxed or tightened according to actual conditions.
- the RPU value of the relevant area obtained based on *.pair different types of structural anomalies usually have different effects on the RPU. For example, in the case of balanced translocation, the RPU on both sides of the breakpoint does not change significantly. In the case of missing or repetitive structural abnormalities, the RPU of the region between the breakpoints is significantly reduced or increased, so the RPU value of the relevant region can be used to further verify or assist in determining the occurrence of chromosomal structural abnormalities.
- the RPU of the region between the breakpoints should be higher than the average and the range of variation exceeds V-rm;
- the RPU of the region between the breakpoints should be below the average and vary beyond V-rm.
- the RPU of the relevant area may be provided in a graphical, tabular or other easy-to-read manner, or the entire range of RPU changes may be provided in a graphical form, a table, etc., so that The operator understands the overall situation.
- the Reads alignment near the N zone (which includes the centromere and telomere regions) is relatively more complex with other regions, and if the obtained cluster is not located in the N region It can generally be considered that it can be judged based on the obtained information. If the obtained cluster is located in the N zone, more careful verification may be required, such as joint use of filtering conditions and auxiliary parameters, or may be combined with other external data, such as a table of target individuals. The final determination is made by the results of the type, and/or further precise sequencing of the breakpoints (eg, Sanger sequencing).
- Step5. Perform data analysis on the filtered result cluster.
- the presence of the resulting cluster after filtering reflects the possible occurrence of a corresponding type of chromosome structural abnormality, so this step is not necessary if only structural anomalies that may exist are found.
- the obtained result clusters can be further analyzed. According to different types of result clusters, the following analysis methods can be used:
- the position of the innermost Read is obtained, and the preset length is extended inward from the position as the range of the breakpoint, and the innermost Read refers to if the cluster contains all Read on the left end, the Read on the far right is the innermost Read. If the cluster contains the right Read, the leftmost Read is the innermost Read. This situation is usually related to unbalanced translocations, where Read in the same cluster is distributed on one side of the breakpoint.
- the span of the breakpoint range extending from the innermost Read can be determined according to L-lib, L-r1/L-r2, sequencing depth, etc., for example, 0.5 to 2 times L-lib, generally not more than 2 times L -lib.
- FIG. 2 a case of balanced translocation is shown. If a pair of result clusters are obtained (only two read length sequences are drawn in each cluster, the rest are regarded as omitted) as shown in FIG. 2, one result The cluster is located near the position pa of the chromosome chra, and its paired result cluster is located near the position pb of the chromosome chrb. Because of the cluster on chra, Read1 is the Read end of the left end of the chromosome fragment, and its adjacent Read2 is the Read end of the right end of the chromosome fragment. Therefore, it can be considered that the breakpoint pa of chra is located between Read1 and Read2, and the analysis on chrb is Similar.
- the following result data can be output: the number of two chromosomes in which a translocation structural abnormality may occur (the chromosome in which the result clusters are respectively located), and the two ends of the paired result cluster
- the range of positions (the range of positions of the ends of the cluster on the two chromosomes, the span of the two ends of the cluster can be obtained), the range of the breakpoint obtained by the analysis, and the like.
- the relevant parameters and other auxiliary parameters generated during the filtering process can also be output together, for example, the compactness of each pair of result clusters, the linear correlation between each other, the number of Reads supporting the pair of result clusters, and the performance breakpoints. Graphs, tables, etc. of the side RPU changes.
- FIG. 3 a case of tandem repetition is shown, in which a pair of result clusters (only one read length sequence is drawn in each cluster, and the rest are regarded as omitted) are located between the start and end points of the repeated segments. Therefore, it can be considered that the start and end points of the repeated segments are located in a range extending outward from the most edge of the cluster (the two Reads do not necessarily belong to a pair of Reads).
- the result data of the repetitive structural anomaly output is roughly the same, the difference is that the chromosome numbers at both ends of the cluster are the same, and data indicating the length of the estimated repetitive segment can also be output.
- both ends of the paired result cluster (only one read length sequence is drawn in each cluster, and the rest are regarded as omitted) are located outside the start and end points of the missing segment, It can be considered that the starting and ending points of the missing segment are in the range extending inward from the closest Read at both ends of the cluster (the two Reads do not necessarily belong to a pair of Reads).
- the result data types of the missing structural anomaly output are approximately the same as those of the repetitive structural anomaly, except that the output data representing the length of the segment between the estimated breakpoints represents the length of the missing segment.
- Step6 Breakpoint assembly.
- N can be reasonably set according to the length of Lr1/Lr2. Since the sequence length is less than 25 bp, the unique comparison rate will be greatly reduced. Therefore, when setting the value of N, it can be considered that the length of the truncated subsequence is not lower than Or not significantly below 25bp.
- the range of the breakpoint can be effectively reduced.
- the probe can be further prepared according to the position range of the breakpoint, and other accurate sequencing methods, such as Sanger sequencing, can be used to obtain accurate Breakpoint position for further study of breakpoints. If you do not need to narrow the breakpoint range, this step can be omitted.
- an apparatus for detecting an abnormality in a chromosome structure comprising: a data input unit for inputting data; a data output unit for outputting data; and a storage unit for storing data, including executable a program, connected to the data input unit, the data output unit, and the storage unit, for executing an executable program stored in the storage unit, the execution of the program includes completing all of the various methods in the foregoing embodiment or Part of the steps.
- L-lib is 500bp
- PE50 sequencing pair-end Sequencing, L-r1 and L-r2 are basically 50 bp;
- V-lib is ⁇ 45 bp
- RPK has V-rm of 20%
- V-cl is 10Kbp (inter-cluster distance is defined as the distance between two nearest Reads)
- the minimum number of Reads in the cluster is 2
- the ranking is in the lowest range of 5%
- the control set includes 90 normal samples with a V-con of 5.
- This example is a study of the family of meow syndrome.
- the two target individuals in this example belong to a family, where "FA” means father and "SON” means son.
- the genome-wide low multipliers were sequenced for the two target individuals, respectively, with a "FA” sequencing depth of 2.2 and a “SON” sequencing depth of 3.1.
- the number of the two chromosomes in which the paired result cluster is located chr12, chr5
- the number of the two chromosomes in which the paired result cluster is located chr12, chr5
- RPK The change of RPK in the relevant region on the chromosome: as shown in Figure 7, the abscissa is the position on the chromosome, in 10Kbp, the ordinate is RPK, the curve is drawn according to the data of SON.pair, and pa and pb are broken. Point position, it can be seen from the figure that the RPK of "SON" has obvious changes. Looking at the RPK calculation value, the RPK of the forearm of chromosome 5 of SON is only 0.5 times of the average value, and the forearm of chromosome 12 is more than the average value. 0.5 times.
- This case is a study of congenital heart disease.
- the target individual in this case is a patient with congenital heart disease, expressed as "XX".
- sequencing results are then compared to the reference sequence HG19 using SOAP alignment software to obtain XX.sin.
- the number of the two chromosomes in which the paired result cluster is located chr14, chr14
- the position of the two ends of the paired result cluster is in the range of 7357040-73557288, 73670432-73670682
- Tightness (variance) at the left and right ends 100.63, 100.59
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Medical Informatics (AREA)
- Organic Chemistry (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Biochemistry (AREA)
- Molecular Biology (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- General Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Public Health (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
- Algebra (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
Abstract
Description
Claims (14)
- 一种检测染色体结构异常的方法,其特征在于,包括如下步骤,获取目标个体的全基因组测序结果,所述测序结果包括多对读长对,每对读长对由两个读长序列组成,分别位于所测染色体片段的两端,每对读长对分别来自相应染色体片段的正链和负链,或者,每对读长对同时来自相应染色体片段的正链或负链;将所述测序结果与参考序列进行比对,获得异常匹配集,所述异常匹配集包括符合下述描述的第一类读长对,第一类读长对中的两个读长序列分别匹配到参考序列的不同染色体;按照匹配到的位置将所述异常匹配集中的读长序列聚类成簇,每个簇中含有来自一组读长对的单端的读长序列,相应的另一端的读长序列位于另一个簇中;对聚类得到的簇进行过滤,其中包括,计算各个簇的紧致程度,过滤掉紧致程度不满足预置要求R-va的簇及与其成对的簇,获得过滤后的含有第一类读长对的结果簇,以用于判断染色体易位性结构异常的发生。
- 如权利要求1所述的方法,其特征在于,在对聚类得到的簇进行过滤时,还包括,计算成对的两个簇的线性相关性,过滤掉线性相关性不满足预置要求R-li的成对的簇,和/或,将成对的簇与预置的包含多个正常样本的对照集进行比对,过滤掉命中正常样本的数目达到预置阈值V-con的成对的簇。
- 如权利要求1所述的方法,其特征在于,还包括,搜索含有第一类读长对的结果簇,若相邻的两个读长序列在各自所属的读长对中的位置相反,获取这两个读长序列匹配到的位置之间的范围作为断点的范围,若不存在上述情况的读长序列,获取最靠内的读长序列的位置,并从该位置向外延伸预置长度作为断点的范围。
- 如权利要求1所述的方法,其特征在于,所述异常匹配集还包括符合下述描述的第二类读长对,第二类读长对中的两个读长序列匹配到参考序列的相同染色体,但根据匹配到的位置所计算出来的染色体片段的长度L-pr为负值;还获得过滤后的含有第二类读长对的结果簇,以用于判断染色体串联重复性结构异常的发生。
- 如权利要求4所述的方法,其特征在于,还包括,搜索含有第二类读长对的结果簇,在成对的簇中获取匹配到的距离最远的两个位置之间的范围作为发生重复的范围,并从该两个位置分别向外延伸预置长度作为断点的范围。
- 如权利要求1所述的方法,其特征在于,所述异常匹配集还包括符合下述描述的第三类读长对,第三类读长对中的两个读长序列匹配到参考序列的相同染色体,但根据匹配到的位置所计算出来的染色体片段的长度L-pr大于文库大小L-lib且偏差超过预置的阈值V-lib,V-lib优选为5%×L-lib~15%×L-lib,进一步优选为10%×L-lib;还获得过滤后的含有第三类读长对的结果簇,以用于判断染色体缺失性结构异常的发生。
- 如权利要求6所述的方法,其特征在于,还包括,搜索含有第三类读长对的结果簇,在成对的簇中获取匹配到的距离最近的两个位置之间的范围作为发生缺失的范围,并从该两个位置分别向内延伸预置长度作为断点的范围。
- 如权利要求1-7任意一项所述的方法,其特征在于,在将所述测序结果与参考序列进行比对时,还包括,获得正常匹配集,所述正常匹配集包括符合下述描述的读长对,读长对中的两个读长序列匹配到参考序列的相同的染色体,且匹配到的位置的正负链关系与该读长对中的正负链关系一致,且根据匹配到的位置所计算出来的染色体片段的长度L-pr与测序所使用的文库的大小L-lib的偏差小于预置的阈值V-lib,V-lib优选为5%×L-lib~15%×L-lib,进一步优选为10%×L-lib,统计单位长度所包含的正常匹配集中的Reads的数量RPU,获得RPU相对于平均值的变化情况,以用于辅助判断结构异常的发生,优选的,RPU相对于平均值的变化以RPU的变化是否超过预置阈值V-rm来表示,优选的,V-rm为10~30%,进一步优选为20%。
- 如权利要求1-7任意一项所述的方法,其特征在于,在将所述测序结果与参考序列进行比对时,还包括,获得无法匹配集,所述无法匹配集包括无法匹配到参考序列的读长序列,其中包括成对无法匹配的读长序列或单端无法匹配的读长序列,在获得结果簇后,还包括,获取所确定的断点范围周围设定范围内的单端的读长序列,从无法匹配集中提取与之成对的读长序列作为补丁序列,将所有补丁序列截成N段,N优选为2,并将补丁序列截断后获得的子序列重新与参考序列进行比对,按照能够正常匹配的结果对断点区域进行组装。
- 如权利要求1-7任意一项所述的方法,其特征在于,在计算各个簇的紧致程度时,放弃位于簇的两端的各5%至25%的读长序列不参与计算,和/或,以方差来表示紧致程度,R-va设置为方差在全部簇中的排名处于2%~10%的最低区间内,优选为5%。
- 如权利要求2所述的方法,其特征在于,在计算成对的两个簇的线性相关性时,以相关系数来表示线性相关性,R-li设置为相关系数在全部簇中的排名处于2%~10%的最高区间内,优选为5%,和/或,V-con与对照集中正常样本数的比例为3%-10%,优选为5%-6%。
- 如权利要求1所述的方法,其特征在于,测序所使用的文库的大小L-lib≥300bp,优选为500bp或5Kbp,和/或,读长序列的长度大于等于25bp,优选为50bp正负10%。
- 一种检测染色体结构异常的装置,其特征在于,包括:数据输入单元,用于输入数据;数据输出单元,用于输出数据;存储单元,用于存储数据,其中包括可执行的程序;处理器,与所述数据输入单元、数据输出单元及存储单元数据连接,用于执行所述可执行的程序,所述程序的执行包括完成如权利要求1-12任意一项所述的方法。
- 一种计算机可读存储介质,其特征在于,用于存储供计算机执行的程序,所述程序的执行包括完成如权利要求1-12任意一项所述的方法。
Priority Applications (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP13884613.4A EP2998407B2 (en) | 2013-05-15 | 2013-05-15 | Method for detecting chromosomal structural abnormalities and device therefor |
HUE13884613A HUE047501T2 (hu) | 2013-05-15 | 2013-05-15 | Eljárás kromoszómális szerkezeti abnormalitások kimutatására, és ennek eszköze |
ES13884613T ES2766860T5 (es) | 2013-05-15 | 2013-05-15 | Método para detectar anomalías estructurales cromosómicas y dispositivo para ello |
US14/890,989 US11004538B2 (en) | 2013-05-15 | 2013-05-15 | Method and device for detecting chromosomal structural abnormalities |
PCT/CN2013/075622 WO2014183270A1 (zh) | 2013-05-15 | 2013-05-15 | 一种检测染色体结构异常的方法及装置 |
PL13884613.4T PL2998407T5 (pl) | 2013-05-15 | 2013-05-15 | Sposób wykrywania nieprawidłowości strukturalnych chromosomów i urządzenie do tego sposobu |
RU2015153453A RU2654575C2 (ru) | 2013-05-15 | 2013-05-15 | Способ и устройство для детектирования хромосомных структурных аномалий |
CN201380004734.0A CN104302781B (zh) | 2013-05-15 | 2013-05-15 | 一种检测染色体结构异常的方法及装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2013/075622 WO2014183270A1 (zh) | 2013-05-15 | 2013-05-15 | 一种检测染色体结构异常的方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014183270A1 true WO2014183270A1 (zh) | 2014-11-20 |
Family
ID=51897591
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2013/075622 WO2014183270A1 (zh) | 2013-05-15 | 2013-05-15 | 一种检测染色体结构异常的方法及装置 |
Country Status (8)
Country | Link |
---|---|
US (1) | US11004538B2 (zh) |
EP (1) | EP2998407B2 (zh) |
CN (1) | CN104302781B (zh) |
ES (1) | ES2766860T5 (zh) |
HU (1) | HUE047501T2 (zh) |
PL (1) | PL2998407T5 (zh) |
RU (1) | RU2654575C2 (zh) |
WO (1) | WO2014183270A1 (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107077538A (zh) * | 2014-12-10 | 2017-08-18 | 深圳华大基因研究院 | 测序数据处理装置和方法 |
CN107075564A (zh) * | 2014-12-10 | 2017-08-18 | 深圳华大基因研究院 | 确定肿瘤核酸浓度的方法和装置 |
CN107077533A (zh) * | 2014-12-10 | 2017-08-18 | 深圳华大基因研究院 | 测序数据处理装置和方法 |
CN111583996A (zh) * | 2020-04-20 | 2020-08-25 | 西安交通大学 | 一种模型非依赖的基因组结构变异检测系统及方法 |
US11004538B2 (en) | 2013-05-15 | 2021-05-11 | Bgi Genomics Co., Ltd. | Method and device for detecting chromosomal structural abnormalities |
CN118969073A (zh) * | 2024-10-21 | 2024-11-15 | 烟台大学 | 基于等位基因感知的插入或缺失变异检测方法、系统 |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107688727B (zh) * | 2016-08-05 | 2020-07-14 | 深圳华大基因股份有限公司 | 生物序列聚类和全长转录组中转录本亚型识别方法和装置 |
CN107058465B (zh) * | 2016-10-14 | 2021-10-01 | 南方科技大学 | 一种利用单倍体测序技术检测染色体平衡易位的方法 |
CN106845155B (zh) * | 2016-12-29 | 2021-11-16 | 安诺优达基因科技(北京)有限公司 | 一种用于检测内部串联重复的装置 |
CN106709276A (zh) * | 2017-01-21 | 2017-05-24 | 深圳昆腾生物信息有限公司 | 一种基因变异成因分析方法及系统 |
CN109280702A (zh) * | 2017-07-21 | 2019-01-29 | 深圳华大基因研究院 | 确定个体染色体结构异常的方法和系统 |
CN108830044B (zh) * | 2018-06-05 | 2020-06-26 | 序康医疗科技(苏州)有限公司 | 用于检测癌症样本基因融合的检测方法和装置 |
CN109887547B (zh) * | 2019-03-06 | 2020-10-02 | 苏州浪潮智能科技有限公司 | 一种基因序列比对滤波加速处理方法、系统及装置 |
CN112687341B (zh) * | 2021-03-12 | 2021-06-04 | 上海思路迪医学检验所有限公司 | 一种以断点为中心的染色体结构变异鉴定方法 |
CN114743594B (zh) * | 2022-03-28 | 2023-04-18 | 深圳吉因加医学检验实验室 | 一种用于结构变异检测的方法、装置和存储介质 |
CN115910199B (zh) * | 2022-11-01 | 2023-07-14 | 哈尔滨工业大学 | 一种基于比对框架的三代测序数据结构变异检测方法 |
CN115831223B (zh) * | 2023-02-20 | 2023-06-13 | 吉林工商学院 | 一种挖掘近源物种间染色体结构变异的分析方法及系统 |
CN118335196A (zh) * | 2024-06-13 | 2024-07-12 | 安诺优达基因科技(北京)有限公司 | 一种微小染色体组装鉴定装置、方法及其应用 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101561845A (zh) * | 2008-12-12 | 2009-10-21 | 深圳华大基因研究院 | 一种染色体同线性同源区域的检测方法和系统 |
CN101914628A (zh) * | 2010-09-02 | 2010-12-15 | 深圳华大基因科技有限公司 | 检测基因组目标区域多态性位点的方法及 系统 |
CN102409099A (zh) * | 2011-11-29 | 2012-04-11 | 浙江大学 | 一种利用测序技术分析猪乳腺组织基因表达差异的方法 |
WO2012097474A1 (zh) * | 2011-01-20 | 2012-07-26 | 深圳华大基因科技有限公司 | 检测转基因外源片段插入位点的方法和系统 |
CN102789553A (zh) * | 2012-07-23 | 2012-11-21 | 中国水产科学研究院 | 利用长转录组测序结果装配基因组的方法及装置 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7943304B2 (en) | 2005-01-12 | 2011-05-17 | Ramesh Vallabhaneni | Method and apparatus for chromosome profiling |
WO2011143231A2 (en) * | 2010-05-10 | 2011-11-17 | The Broad Institute | High throughput paired-end sequencing of large-insert clone libraries |
CN103384725A (zh) | 2010-12-23 | 2013-11-06 | 塞昆纳姆股份有限公司 | 胎儿遗传变异的检测 |
ES2766860T5 (es) | 2013-05-15 | 2023-02-23 | Bgi Genomics Co Ltd | Método para detectar anomalías estructurales cromosómicas y dispositivo para ello |
-
2013
- 2013-05-15 ES ES13884613T patent/ES2766860T5/es active Active
- 2013-05-15 RU RU2015153453A patent/RU2654575C2/ru active
- 2013-05-15 US US14/890,989 patent/US11004538B2/en active Active
- 2013-05-15 CN CN201380004734.0A patent/CN104302781B/zh active Active
- 2013-05-15 EP EP13884613.4A patent/EP2998407B2/en active Active
- 2013-05-15 WO PCT/CN2013/075622 patent/WO2014183270A1/zh active Application Filing
- 2013-05-15 HU HUE13884613A patent/HUE047501T2/hu unknown
- 2013-05-15 PL PL13884613.4T patent/PL2998407T5/pl unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101561845A (zh) * | 2008-12-12 | 2009-10-21 | 深圳华大基因研究院 | 一种染色体同线性同源区域的检测方法和系统 |
CN101914628A (zh) * | 2010-09-02 | 2010-12-15 | 深圳华大基因科技有限公司 | 检测基因组目标区域多态性位点的方法及 系统 |
WO2012097474A1 (zh) * | 2011-01-20 | 2012-07-26 | 深圳华大基因科技有限公司 | 检测转基因外源片段插入位点的方法和系统 |
CN102409099A (zh) * | 2011-11-29 | 2012-04-11 | 浙江大学 | 一种利用测序技术分析猪乳腺组织基因表达差异的方法 |
CN102789553A (zh) * | 2012-07-23 | 2012-11-21 | 中国水产科学研究院 | 利用长转录组测序结果装配基因组的方法及装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP2998407A4 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11004538B2 (en) | 2013-05-15 | 2021-05-11 | Bgi Genomics Co., Ltd. | Method and device for detecting chromosomal structural abnormalities |
CN107077538A (zh) * | 2014-12-10 | 2017-08-18 | 深圳华大基因研究院 | 测序数据处理装置和方法 |
CN107075564A (zh) * | 2014-12-10 | 2017-08-18 | 深圳华大基因研究院 | 确定肿瘤核酸浓度的方法和装置 |
CN107077533A (zh) * | 2014-12-10 | 2017-08-18 | 深圳华大基因研究院 | 测序数据处理装置和方法 |
CN107077538B (zh) * | 2014-12-10 | 2020-08-07 | 深圳华大生命科学研究院 | 测序数据处理装置和方法 |
CN107077533B (zh) * | 2014-12-10 | 2021-07-27 | 深圳华大生命科学研究院 | 测序数据处理装置和方法 |
CN111583996A (zh) * | 2020-04-20 | 2020-08-25 | 西安交通大学 | 一种模型非依赖的基因组结构变异检测系统及方法 |
CN111583996B (zh) * | 2020-04-20 | 2023-03-28 | 西安交通大学 | 一种模型非依赖的基因组结构变异检测系统及方法 |
CN118969073A (zh) * | 2024-10-21 | 2024-11-15 | 烟台大学 | 基于等位基因感知的插入或缺失变异检测方法、系统 |
Also Published As
Publication number | Publication date |
---|---|
ES2766860T3 (es) | 2020-06-15 |
EP2998407A1 (en) | 2016-03-23 |
CN104302781A (zh) | 2015-01-21 |
US11004538B2 (en) | 2021-05-11 |
ES2766860T5 (es) | 2023-02-23 |
RU2654575C2 (ru) | 2018-05-21 |
PL2998407T5 (pl) | 2023-01-30 |
EP2998407A4 (en) | 2017-01-11 |
PL2998407T3 (pl) | 2020-05-18 |
CN104302781B (zh) | 2016-09-14 |
HUE047501T2 (hu) | 2020-04-28 |
EP2998407B2 (en) | 2022-11-30 |
EP2998407B1 (en) | 2019-12-04 |
RU2015153453A (ru) | 2017-06-20 |
US20160085911A1 (en) | 2016-03-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2014183270A1 (zh) | 一种检测染色体结构异常的方法及装置 | |
CN107194208B (zh) | 一种基因分析注释方法和装置 | |
Pfeifer | From next-generation resequencing reads to a high-quality variant data set | |
CN105886616B (zh) | 一种用于猪基因编辑的高效特异性sgRNA识别位点引导序列及其筛选方法 | |
US20210057045A1 (en) | Determining the Clinical Significance of Variant Sequences | |
Zook et al. | A robust benchmark for germline structural variant detection | |
WO2017023148A1 (ko) | 다양한 플랫폼에서 태아의 성별과 성염색체 이상을 구분할 수 있는 새로운 방법 | |
Turkahia et al. | Pandemic-scale phylogenomics reveals elevated recombination rates in the SARS-CoV-2 spike region | |
CN110029157B (zh) | 一种检测肿瘤单细胞基因组单倍体拷贝数变异的方法 | |
WO2013065944A1 (ko) | Ngs를 위한 서열 재조합 방법 및 장치 | |
WO2019139363A1 (ko) | 무세포 dna를 포함하는 샘플에서 순환 종양 dna를 검출하는 방법 및 그 용도 | |
CN111081315A (zh) | 一种同源假基因变异检测的方法 | |
WO2017126943A1 (ko) | 염색체 이상 판단 방법 | |
CN105303068A (zh) | 一种基于参考基因组和从头组装相结合的二代测序数据组装方法 | |
Yang et al. | Tcrklass: a new k-string–based algorithm for human and mouse tcr repertoire characterization | |
CN106834490A (zh) | 一种鉴定胚胎平衡易位断裂点和平衡易位携带状态的方法 | |
WO2017135768A1 (ko) | 추정 자손의 유전질환 발병 위험성을 예측하는 방법 및 시스템 | |
WO2015043278A1 (zh) | 同时进行单体型分析和染色体非整倍性检测的方法和系统 | |
CN112126677A (zh) | 耳聋单倍型基因突变无创检测方法 | |
WO2017086675A1 (ko) | 대사 이상 질환 진단 장치 및 그 방법 | |
Sobkowiak et al. | Comparing Mycobacterium tuberculosis transmission reconstruction models from whole genome sequence data | |
Mittleman et al. | Divergence in alternative polyadenylation contributes to gene regulatory differences between humans and chimpanzees | |
CN113611358A (zh) | 样品病原细菌分型方法和系统 | |
WO2019031867A1 (ko) | 앰플리콘 기반 차세대 염기서열 분석기법에서 프라이머 서열을 제거하여 분석의 정확도를 높이는 방법 | |
WO2020141722A1 (ko) | 모체 시료 중 태아 분획을 결정하는 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13884613 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14890989 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2013884613 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2015153453 Country of ref document: RU Kind code of ref document: A |