CN115831234A - Chromosome instability based early cancer screening and diagnosing method - Google Patents
Chromosome instability based early cancer screening and diagnosing method Download PDFInfo
- Publication number
- CN115831234A CN115831234A CN202310023027.1A CN202310023027A CN115831234A CN 115831234 A CN115831234 A CN 115831234A CN 202310023027 A CN202310023027 A CN 202310023027A CN 115831234 A CN115831234 A CN 115831234A
- Authority
- CN
- China
- Prior art keywords
- chromosome
- calculating
- reads
- healthy
- file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention relates to the technical field of medical molecular biology, in particular to a chromosome instability-based early cancer screening and diagnosing method, which comprises the steps of carrying out gene-targeted sequencing on a sample, obtaining an original fastq file, carrying out data control comparison on the original fastq file to obtain a bam file according to a reference genome, establishing a blacklist region, and calculating a chromosome instability value TAZscore on a chromosome half-arm level on the basis of the blacklist region. The method is characterized in that chromosome instability is used for early cancer prediction, a TAZscore algorithm is proposed for the first time, two chromosome arms are adopted for calculating a final performance index through data optimization, under the condition that sequencing cost needs to be controlled, healthy people and cancer patients are furthest distinguished, and the method has good sensitivity and specificity.
Description
Technical Field
The invention relates to the technical field of medical molecular biology, in particular to a cancer early-stage screening and diagnosing method based on chromosome instability.
Background
At present, the early cancer screening can be mainly divided into two categories, one category is the traditional detection methods such as computer scanning technology, endoscopy technology, cell smear technology and the like, and the detection methods comprise imaging detection, tumor marker detection and pathological section detection (gold standard). Imaging is performed by means of X-ray, B-ultrasonic, CT, magnetic resonance imaging and the like, and the tumors are generally found to miss the optimal treatment time. The tumor marker is used for detecting that the sensitivity and the specificity are poor and the false positive and the false negative are more; pathological sections are gold standard but require needle biopsy, and in addition, tumors are already in a middle-advanced stage when pathological section results suggest tumor tissue.
Another type of method for early screening for cancer is a liquid biopsy method. The liquid biopsy method is less invasive than the above methods and is more advantageous for early cancer detection. At present, blood, urine or saliva is mainly used as a sample for liquid biopsy, and tumor-derived cells, DNA, mRNA, microRNA, protein and the like are detected from the sample to determine the state of a cancer patient. Because of less circulating tumor cells, detection of DNA is the main clinical. In recent studies, fluid biopsy technology based on detection of genetic variation in cfDNA has shown great potential in early detection of cancer, where chromosome instability signals are an important branch. Most cancer patients have a population of cells with chromosome instability, a major form of genomic instability, that is often present in various cancers.
Tumor formation is associated with genetic mutations and apparent modifications, but many studies have now shown that chromosomal instability can lead to errors in mitogenesis of cells, and there is an important link between chromosomal instability and metastases, particularly because chromosomal instability is enhanced in metastases and thus contributes to carcinogenesis, as compared to primary tumors in cancer patients with multiple tumor types. How to make chromosome instability to assist early cancer prediction has very important significance.
Disclosure of Invention
In view of the above-mentioned deficiencies in the background art, the present invention provides a diagnostic method for early screening of cancer based on chromosomal instability.
The technical scheme adopted by the invention is as follows: the method for screening and diagnosing the cancer at the early stage based on the chromosome instability is characterized by comprising the following steps:
s1, performing gene targeted sequencing on a sample to obtain an original fastq file; the fastq file is a file format for storing biological sequences and corresponding base qualities based on texts;
s2, performing data control comparison on the original fastq file to obtain a bam file by a reference genome;
s3, establishing a blacklist area;
s4, calculating a chromosome instability value TAZscore at the chromosome half-arm level based on the blacklisted region.
Preferably, S2 specifically includes the following steps:
s21, controlling the quality of fastq file data: removing reads containing joints and removing reads with overlarge N proportion; this step can reduce the error rate of comparison;
s22, comparing the fastq file with a reference genome file to generate a bam file; this step can result in the specific location of each sequencing datum on the reference genome;
s23. Bam file redundancy removal: removing redundant sequencing data in the bam file by SAMTools software; this step can filter out the redundant reads generated by the PCR experiment;
s24. Bam file filtering: removing low-quality reads in the bam file by SAMTools software;
s25, GATK recalibration: the GATK software compares the redundant bam files again; this step is mainly to re-correct the regions of sequence insertion or sequence deletion found during BWA alignment.
Preferably, the blacklisted region in S3 includes a region 20% of the CV value in a healthy genome, a region having gap in a reference genome, and a region having ACRO1, ALR/Alpha, BSR/Beta, (CATTC) n, chrM, (GAATG) n, (GAGTG) n, HSATII, LSU-rRNA _ Hsa, SSU-rRNA _ Hsa, TAR1 in a reference genome. The blacklist area is shown to be not only from documents and databases, but also more importantly, health sample data is added, and the diversity of data sources is reflected.
Preferably, the area of the healthy genome which is 20% of the CV value is determined by the following method: collecting weight ratio versus bam files of healthy samples, cutting a reference genome according to 50KB, wherein the cut regions do not overlap, calculating the ratio of each reads in each region, then performing GC content correction, after the GC correction is completed, calculating the ratio of the reads after the GC correction in each region, namely the Mean value of all healthy people and the standard deviation SD, finally calculating CV = Mean/SD, and taking the region with the largest CV value and 20% as one of blacklist regions.
Preferably, S4 specifically includes the following steps:
s51, calculating the contents of reads of the healthy samples in all bins;
s52, removing bins in the blacklist area;
s53, calculating the proportion of reads of each healthy sample in each chromosome semi-arm;
s54, calculating the average mean health arm and standard deviation SDhealth arm of the ratio of the reads content of all the healthy samples on each half arm level;
s55, calculating the ZCore and the TAZCore of each chromosome arm of the sample to be detected. A TAZscore calculation mode is put forward for the first time, and the final performance index is calculated by adopting two chromosome arms through data optimization, so that the method has good sensitivity and specificity.
Preferably, the Zcore is calculated by formula 1:
Wherein, caseArm represents the proportion of reads content of the sample to be tested on the half arm level.
Preferably, the TAZscore is calculated using formula 2:
Where Zcore takes two maxima.
Preferably, the samples are tissue fluid samples and block samples from healthy people and tumor people, and the tissue fluid samples comprise any one of tissue grinding fluid, nasal swabs, virus fluid, blood, serum, semen, saliva, urine and plasma; the massive sample comprises any one of tissue blocks, transgenic mouse tails and toenails, and in a preferred scheme, the sample is a plasma sample.
Preferably, TAZscore is evaluated for stability using formula 3
Wherein Max is the maximum value of TAZscore of the same healthy person under the same depth gradient or the maximum value of TAZscore of the same sample under different depths.
Has the advantages that: compared with the prior art, the chromosome instability serving as the early cancer prediction is provided by the chromosome instability early screening and diagnosing method based on chromosome instability, a TAZscore algorithm is firstly proposed, two chromosome arms are adopted for calculating the final performance index through data optimization, and under the condition that the sequencing cost needs to be controlled, healthy people and cancer patients are furthest distinguished, so that the method has good sensitivity and specificity.
Drawings
FIG. 1 is a graph of AUC of classification performance based on the present invention;
FIG. 2 is a schematic of the stability of the same sample at different depths;
FIG. 3 is a graph showing the stability of the same sample at the same depth.
Detailed Description
In order to make the technical solutions of the present invention better understood, the present invention will be described in detail below with reference to the accompanying drawings and the detailed description.
Example 1 sample data extraction
Plasma from two populations, one healthy (N = 32) and one cancer patient (N = 112), were randomly selected for on-machine sequencing. The specific process is as follows:
cfDNA extraction: the cfDNA in the plasma sample is extracted by adopting a plasma extraction Kit, the specific operation is described in the specification of a QIAamp Circulating Nuleacid Kit of QIAGEN company, and the extracted DNA is quantified by using a qubit4.0 and a dsDNA HS Assay Kit.
Library construction: repairing the tail end and adding an A tail at the 3' tail end; 10-50ng cfDNA was taken into a PCR tube, made up to 50. Mu.L with Low TE, and reagents were added as in Table 1 below.
TABLE 1
Components | Volume of |
End repair and buffer addition A | 7 μL |
End repair and addition of A enzyme | 3 |
cfDNA | |
50 μL | |
Total volume | 60 μL |
Vortex mixing, microcentrifugation, and set the following procedure for reaction on a PCR instrument, table 2:
TABLE 2
Step (ii) of | Temperature of | Time |
Step1 | 30℃ | 30min |
Step2 | 65℃ | 30min |
Step3 | 4℃ | ∞ |
Connecting joints: the corresponding reagents were added to the system after the end of the above reaction according to the following table 3:
TABLE 3
Components | Volume of |
End-repair plus A reaction product | 60 μL |
Connecting mixture | 16.5 μL |
Ligase | 2.5 μL |
Joint (15 μ M) | 2.5 μL |
Total volume | 80 μL |
Vortex for mixing, microcentrifuge, set up for reactions on a PCR instrument (hot lid closed) according to the procedure of table 4:
TABLE 4
Step (ii) of | Temperature of | Time |
Step1 | 20℃ | 15-30min |
Step2 | 4℃ | ∞ |
And (3) purification after connection: preserving the Beckman Agencourt AMPure XP magnetic beads at 2-8 ℃, and balancing at room temperature for at least 30min; to each sample, 80. Mu.L (1 Xvolume) of AMPure XP magnetic beads were added and mixed well by pipetting or shaking. Standing for 5 minutes at room temperature; placing the magnetic frame and standing for 2 minutes, sucking and removing the supernatant by using a liquid transfer machine after all the magnetic beads are adsorbed to the side wall, and paying attention to not disturbing the magnetic beads; slowly adding 200 μ L of 80% ethanol into the tube wall of the magnetic frame along the direction opposite to the magnetic beads, standing for 30s-1min, sucking with a pipette, and removing the supernatant; repeating the above steps once, and using a 10 mu L pipette to suck and remove the residual ethanol as far as possible; drying the magnetic beads for 5 minutes at room temperature; resuspend the beads in 21. Mu.L of low TE buffer per sample; blowing or shaking by a pipettor, fully and uniformly mixing, and incubating for 1 minute at room temperature; placing on a magnetic frame, and incubating for 2 minutes at room temperature; after the magnetic beads are completely adsorbed to the side wall, transferring 20 mu L of supernatant into a new PCR tube for amplification; library amplification: the corresponding reagents were added to the system after the end of the above reaction according to table 5 below:
TABLE 5
Components | Volume of |
Adaptor-ligated DNA | 20 μL |
2X |
25 μL |
index primer | 5 |
Total volume | |
50 μL |
Vortex mixing, microcentrifugation, and setting the procedure of table 6 for reaction on a PCR instrument:
TABLE 6
After the reaction was completed, the PCR product was purified using 1X volume of magnetic beads according to the procedure of magnetic bead purification, and then the pre-library concentration was determined using dsDNA HS Assay Kit, and fragment size detection was performed using QIAxcel nucleic acid electrophoresis analysis system.
cfDNA whole genome sequencing, namely performing on-machine sequencing on the library sample through a second generation sequencer MGI 2000.
Example 2 Mscore calculation to differentiate cancer patient groups from healthy groups
Obtaining fastq to generate a bam file: splitting a BCL file acquired by a sequencing platform according to the index of a sample to obtain data in a fastq format of each sample, removing reads containing a joint and removing reads containing an excessive proportion of N, comparing the fastq file with a reference genome file by using BWA software to generate a bam file, removing redundant sequencing data in the bam file by using SAMTools software, removing low-quality reads in the bam file by using SAMTools software, re-comparing the redundancy-removed bam file by using GATK software, and re-correcting a region in which a sequence is inserted or deleted in the comparison process;
making a blacklist: collecting 32 healthy person weight ratio versus BAM files, cutting a reference genome according to 50KB, wherein the cut regions are not overlapped, calculating the reads ratio of each healthy person in each region, then performing GC content correction, calculating the reads ratio of each healthy person in each region after GC correction in all healthy persons in Mean value Mean and standard deviation SD after GC correction after the GC correction is completed, finally calculating CV = Mean/SD, and finally taking a 20% region with the maximum CV value as one of blacklist regions; taking a region of the reference genome with gap as one of the blacklist regions; one of the blacklisted regions is ACRO1, ALR/Alpha, BSR/Beta, (CATTC) n, chrM, (GAATG) n, (GAGTG) n, HSATII, LSU-rRNA _ Hsa, SSU-rRNA _ Hsa, TAR1 region on the reference genome.
Calculation of TAZscore: calculating the reads content of all bins of 32 healthy people, removing the bins in a blacklist, adding the reads content of all bins on the chromosome half-arm level to obtain the read content ratio of each half-arm, and calculating the average mean Healtharm and standard deviation SDHealtharm of the reads content ratio of all healthy people on each half-arm level; the Zcore is calculated using the formula 1,
Wherein, caseArm represents the proportion of the reads content of the sample to be detected on the half-arm level;
then, two maximum Zscore were taken and TAZscore was calculated for each sample using equation 2
FIG. 1 shows that the AUC is 0.9934, the TAScore threshold is 2.31, the specificity is 90%, the sensitivity is 70%, and the AUC is 0.85, based on the ROC analysis of TAScore.
Example 3 stability verification
(1) The stability of the same sample at different depths was verified: selecting data (0.1X, 0.5X, 1X, 3X, 5X and Original data) of the same healthy person sample at different depths as training data, and calculating a TAScore value, wherein the result is shown in FIG. 2, origin is the stability of the same sample between the sequencing depths of 0.1X, 0.5X, 1X, 3X, 5X and the Original data, depthMore0_1X is the stability of the same sample between the sequencing depths of 0.5X, 1X, 3X, 5X and the Original data, depthMore0_5X is the stability of the same sample between the sequencing depths of 1X, 3X, 5X and the Original data, depthMore1X is the stability of the same sample between the sequencing depths of 3X, 5X and the Original data, and a dashed red line indicates that the error of the sample is 5%;
(2) The stability of the same sample at different depths was verified: the results are shown in fig. 3, 0.1X is the stability of the same sample at a sequencing depth of 0.1X, 0.5X is the stability of the same sample at a sequencing depth of 0.5X, 1X is the stability of the same sample at a sequencing depth of 1X, 3X is the stability of the same sample at a sequencing depth of 3X, 5X is the stability of the same sample at a sequencing depth of 5X, and the red line indicates that the sample error is 5%.
It is demonstrated that TAScore can distinguish between samples of healthy and tumor groups using 0.1, 0.5, 1, 3, 5 and Raw (Raw) data, indicating that the present protocol is still very effective in achieving classification with a stable algorithm.
Finally, it should be noted that the above-mentioned description is only a preferred embodiment of the present invention, and those skilled in the art can make various similar representations without departing from the spirit and scope of the present invention.
Claims (9)
1. The method for screening and diagnosing the cancer early stage based on the chromosome instability is characterized by comprising the following steps:
s1, performing gene targeted sequencing on a sample to obtain an original fastq file;
s2, performing data control comparison on the original fastq file to obtain a bam file by a reference genome;
s3, establishing a blacklist area;
s4, calculating a chromosome instability value TAZscore at the chromosome semi-arm level based on the blacklisted region.
2. The method for the early screening diagnosis of cancer based on chromosome instability as claimed in claim 1, wherein S2 comprises the following steps:
s21, controlling the quality of fastq file data: removing reads containing joints and removing reads with overlarge N proportion;
s22, comparing the fastq file with a reference genome file to generate a bam file;
s23. Bam file redundancy removal: removing redundant sequencing data in the bam file by SAMTools software;
s24. Bam file filtering: removing low-quality reads in the bam file by SAMTools software;
s25, GATK recalibration: the GATK software re-compares the redundancy-removed bam files.
3. The method of claim 1, wherein the blacklisted regions in S3 include the first 20% CV value regions in healthy genome, the gap-containing regions in reference genome, and ACRO1, ALR/Alpha, BSR/Beta, (CATTC) n, chrM, (GAATG) n, (GAGTG) n, HSATI, LSU-rRNA _ Hsa, SSU-rRNA _ Hsa, TAR 1-containing regions in reference genome.
4. The method for the early screening and diagnosis of cancer based on chromosome instability according to claim 3,
the method is characterized in that the area of the top 20% of CV value in the healthy genome is determined by the following method: collecting weight ratio of healthy samples to bam files, cutting a reference genome according to 50KB, wherein the cut regions are not overlapped, calculating each reads ratio in each region, then performing GC content correction, calculating the reads ratio after the GC correction in each region in all healthy people Mean values and standard deviation SD after the GC correction is completed, finally calculating CV = Mean/SD, and taking a 20% region with the largest CV value as one of blacklist regions.
5. The method for the early screening diagnosis of cancer based on chromosome instability as claimed in claim 1,
the method is characterized in that S4 specifically comprises the following steps:
s51, calculating the contents of reads of the healthy samples in all bins;
s52, removing bins in the blacklist area;
s53, calculating the ratio of reads of each healthy sample in each chromosome half-arm;
s54, calculating the average mean health arm and standard deviation SDhealth arm of the ratio of the reads content of all the healthy samples on each half arm level;
s55, calculating the ZCore and the TAZCore of each chromosome arm of the sample to be detected.
8. The method for the early screening diagnosis of cancer based on chromosome instability according to claim 1, wherein: the samples are tissue fluid samples and massive samples from healthy people and tumor people, and the tissue fluid samples comprise any one of tissue grinding fluid, nasal swabs, virus fluid, blood, serum, semen, saliva, urine and plasma; the bulk sample includes any one of tissue bulk, transgenic mouse tail, toenail.
9. The method for the early screening diagnosis of cancer based on chromosome instability according to claim 1, wherein: stability evaluation of TAZscore Using formula 3
Wherein Max is the maximum value of TAZscore of the same healthy person under the same depth gradient or the maximum value of TAZscore of the same sample under different depths.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310023027.1A CN115831234A (en) | 2023-01-06 | 2023-01-06 | Chromosome instability based early cancer screening and diagnosing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310023027.1A CN115831234A (en) | 2023-01-06 | 2023-01-06 | Chromosome instability based early cancer screening and diagnosing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115831234A true CN115831234A (en) | 2023-03-21 |
Family
ID=85520354
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310023027.1A Pending CN115831234A (en) | 2023-01-06 | 2023-01-06 | Chromosome instability based early cancer screening and diagnosing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115831234A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116994656A (en) * | 2023-09-25 | 2023-11-03 | 北京求臻医学检验实验室有限公司 | Method for improving second generation sequencing detection accuracy |
-
2023
- 2023-01-06 CN CN202310023027.1A patent/CN115831234A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116994656A (en) * | 2023-09-25 | 2023-11-03 | 北京求臻医学检验实验室有限公司 | Method for improving second generation sequencing detection accuracy |
CN116994656B (en) * | 2023-09-25 | 2024-01-02 | 北京求臻医学检验实验室有限公司 | Method for improving second generation sequencing detection accuracy |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107771221B (en) | Mutation Detection for Cancer Screening and Fetal Analysis | |
CN108893466B (en) | Sequencing joint, sequencing joint group and detection method of ultralow frequency mutation | |
TW201833329A (en) | Method and system for tumor detection | |
CN110982907B (en) | Thyroid nodule-related rDNA methylation marker and application thereof | |
KR102029393B1 (en) | Circulating Tumor DNA Detection Method Using Sample comprising Cell free DNA and Uses thereof | |
CN112927755B (en) | Method and system for identifying cfDNA (cfDNA) variation source | |
CN113373524B (en) | ctDNA sequencing tag joint, library, detection method and kit | |
WO2018166476A1 (en) | Method for detecting mutation site in sample | |
CN106897579A (en) | New early tumor markers based on chromosomal variation index and its application | |
CN111705135A (en) | Method for detecting MGMT promoter region methylation | |
CN108374047B (en) | Kit for detecting bladder cancer based on high-throughput sequencing technology | |
CN115831234A (en) | Chromosome instability based early cancer screening and diagnosing method | |
CN110468211B (en) | Bladder cancer tumor mutant gene specific primer, kit and library construction method | |
CN114752672A (en) | Detection panel for prognosis evaluation of follicular lymphoma based on circulating free DNA mutation, kit and application | |
CN108315425A (en) | PCR specific primers, kit and its application method of metastasis of thyroid carcinoma related gene detection | |
CN113025721A (en) | Prostate cancer diagnosis and prognosis evaluation kit | |
CN116376918A (en) | SCN5A mutant gene, primer, kit, detection method and application | |
CN111020710A (en) | ctDNA high-throughput detection of hematopoietic and lymphoid tissue tumors | |
CN114875117A (en) | Construction method and kit of gene library for detecting female infertility | |
CN115910349B (en) | Early cancer prediction method based on low-depth WGS sequencing tail end characteristics | |
CN115691665B (en) | Transcription factor-based cancer early-stage screening and diagnosis method | |
CN113948150B (en) | JMML related gene methylation level evaluation method, model and construction method | |
CN118460724B (en) | A methylation marker for lymph node metastasis of early gastric cancer and its application | |
WO2018148903A1 (en) | Auxiliary diagnosis method for urinary system tumours | |
TWI874916B (en) | Methods and systems for tumor detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |