Risk prediction method for improving diagnosis accuracy of chronic obstructive pulmonary acute exacerbation state
Technical Field
The invention relates to the field of medical detection, in particular to a novel risk prediction method for acute exacerbation of chronic obstructive pulmonary disease.
Background
Chronic obstructive pulmonary disease, chronic obstructive pulmonary disease for short, is a common chronic airway inflammatory disease, and is characterized by persistent respiratory system symptoms and airway limitation. Chronic obstructive pulmonary acute exacerbations, which are persistent exacerbations of respiratory symptoms in a patient over a short period of time, further worsen the patient's health, accelerate decline in pulmonary function and increase mortality. Whether patients with chronic obstructive pulmonary disease develop acute exacerbations is currently thought to be the result of a combination of multiple factors, of which genes play an important role.
Chronic obstructive pulmonary disease is a disease with high morbidity and mortality, and the additional treatment required for acute exacerbation brings huge economic burden to the families and the society of patients. In recent years, genome-wide association studies and candidate gene studies have found a large number of slowly obstructive pulmonary susceptibility polymorphic sites. Therefore, establishing a corresponding model by using the genotype data of the individual with the chronic obstructive pulmonary acute exacerbation and the control individual with the chronic obstructive pulmonary non-acute exacerbation is very important for predicting the acute exacerbation risk of the individual with the chronic obstructive pulmonary.
The risk of the acute exacerbation of the chronic obstructive pulmonary disease can be calculated by measuring the genotype data of a person through a risk prediction model. If the risk is higher, a more cautious treatment is required and the patient is also required to pay more attention to the prevention.
Genetic Risk Scoring (GRS) is an effective method for analyzing Single Nucleotide Polymorphisms (SNPs) and clinical phenotypes of complex diseases. A single SNP has a weak effect on a disease, while the GRS method integrates the weak effects of several SNPs, and has been widely used in clinical stages of disease diagnosis, treatment, prognosis, and the like. In the GRS method, whole genome sequencing is performed on a target population, and a GRS model is established using the obtained whole genome data, which can be expressed as:
(β
iweight of the i-th SNP, G
iIndicates the number of risk alleles of the i-th SNP, and n is the number of SNPs). The algorithm considers that each risk allele has a different effect on the disease, and indicates the degree of effect of the SNP on the disease by assigning a corresponding weight to each risk allele.
However, the traditional GRS uses whole genome sequencing, which not only requires a large sample size, but also has a large data volume, complex processing, a slow output speed, and inaccurate localization, so that the research work is heavy and the time period for locating genes is long. Thus, there is a need for a method for more accurately and rapidly predicting the risk of acute exacerbation of chronic obstructive pulmonary disease.
Disclosure of Invention
In view of the above-mentioned needs in the art, the present invention utilizes Whole Exome Sequencing (WES) technology. An exome is the sum of all protein coding sequences on the genomic DNA of a single individual. Human exome sequences account for approximately 1% of the entire human genome sequence, but contain approximately 85% of the causative mutations. Compared with whole genome sequencing, the whole exon sequencing is more economical and efficient, and the method has the technical advantages that the protein coding sequence is directly sequenced to find out the variation influencing the protein structure; high depth sequencing, rare variations with variation frequency below 1% can be found; and only aiming at exome areas, the sequencing cost, the storage space and the workload are effectively reduced. The risk prediction model of eGRS (eagRS) established by SNP data obtained by a whole exon sequencing technology has more advantages in various aspects compared with the traditional GRS.
The invention aims to overcome the existing problems and establish a novel prediction model for the acute exacerbation risk of patients with chronic obstructive pulmonary disease.
The prediction model of the risk of acute exacerbation of the patient with chronic obstructive pulmonary disease is obtained by the following steps:
1) obtaining the SNP genotypes of all exons of the individuals with the acute exacerbation of the slow obstructive pulmonary disease and the individuals without the acute exacerbation of the slow obstructive pulmonary disease;
2) scoring 0, 1 and 2 according to the number of the corresponding risk alleles of each SNP in the data, and screening out the SNP which is obviously related to the acute exacerbation of the chronic obstructive pulmonary disease;
3) further obtaining SNP (Single nucleotide polymorphism) and corresponding OR (OR) value thereof which are obviously related to chronic obstructive pulmonary acute exacerbation;
4) establishing an improved eGRS model by using the SNP related to the chronic obstructive pulmonary acute exacerbation obtained in the step 3) and the corresponding OR value.
In a specific embodiment of the invention, exon gene sequencing is firstly carried out on a large number of individuals with acute exacerbation of chronic obstructive pulmonary disease and individuals with non-acute exacerbation of slow obstructive pulmonary disease, and exon original SNP genotype data of the individuals with acute exacerbation of slow obstructive pulmonary disease and the individuals with non-acute exacerbation of slow obstructive pulmonary disease are obtained.
In a preferred embodiment, the individual's exon SNP genotype data may be obtained using any method known in the art, including but not limited to Ion Proton using Life TechnologiesTMSystem and 5500xl gene analyzer.
And then removing unqualified genotype data through the processes of quality control, Hardy-Weinberg balance inspection and the like, and reserving qualified exon SNP genotype data for further analysis.
In a specific embodiment of the invention, the scoring of 0, 1, 2 is performed based on the number of risk alleles corresponding to each SNP in the data, specifically, the exon SNP genotype data is obtained by defining a score of 2 for homozygotes containing two risk alleles, a score of 1 for heterozygotes containing one risk allele, and a score of 0 for homozygotes without risk alleles (usually, an allele with a low frequency is selected as a risk allele). Marking corresponding SNP data as G by using the scoring methodi。
In a specific embodiment of the invention, in order to screen out the SNPs significantly related to the chronic obstructive pulmonary acute exacerbation, the SNPs (p < 0.01) significantly related to the chronic obstructive pulmonary acute exacerbation and the corresponding OR values thereof are obtained by analyzing by using a logistic regression algorithm.
In a specific embodiment of the present invention, the OR value of the SNPs independently affecting the chronic obstructive pulmonary acute exacerbation obtained as described above is logarithmically determined to obtain the corresponding weight βi. Establishment of improvement using SNPs significantly associated with total chronic obstructive pulmonary acute exacerbationFurther, the eGRS model, which is expressed as the risk allele G of each SNPiWeight β corresponding to SNPiThe sum of the products is used to obtain a model of the risk of developing chronic obstructive pulmonary acute exacerbation, namely eGRS ═ sum (beta)iGi)。
In another aspect of the present invention, there is provided a method of predicting the risk of acute exacerbation in a patient with chronic obstructive pulmonary disease, the method comprising the steps of:
1) obtaining the exon SNP genotype of the patient with chronic obstructive pulmonary disease;
2) using eGRS ═ sum (. beta.) (iGi) Predicting the patient's risk of developing acute exacerbations; wherein, betaiIndicating the association coefficient, G, of the ith acute exacerbation-susceptibility SNPiThe score representing the genotype of the ith SNP, i.e., how many risk alleles of the ith SNP the sample carries, is one of 0, 1, 2.
In another aspect, the invention provides the use of an exon SNP genotype in the preparation of a kit for predicting the risk of acute exacerbations in a patient with chronic obstructive pulmonary disease.
In a preferred embodiment of the use according to the invention, said exon SNP genotype is associated with the risk model eGRS ═ sum (. beta.)iGi) The combination is used for preparing a kit for predicting the acute exacerbation risk of patients with chronic obstructive pulmonary disease.
In a further preferred embodiment of the use according to the invention, the exon SNPs are one or more of the SNPs listed in table 1, or a combination thereof.
In a further preferred embodiment of the use according to the invention, the exon SNPs are all exon SNPs in the human genome.
In another aspect, the present invention provides a method for predicting the risk of acute exacerbations in a patient with chronic obstructive pulmonary disease, the method comprising:
1) data acquisition means for obtaining an exon SNP genotype of the subject;
2) data processing means for executing eGRS sum (β)iGi) The formula (2);
3) output means for outputting a signal corresponding to the execution of eGRS ═ sum (. beta.)iGi) The results of the formula (a) classify the subject into a population at risk of acute exacerbation in patients with chronic obstructive pulmonary disease and a population at risk of non-acute exacerbation in patients with chronic obstructive pulmonary disease.
Compared with the prior art, the invention has the beneficial effects that: in the embodiment of the invention, a novel risk prediction method for acute exacerbation of patients with chronic obstructive pulmonary disease is established, the method combines GRS and WES based on the existing GRS, an improved eGRS method is provided, the sequencing depth is higher by using a full exon sequencing technology, and rare variation with variation frequency lower than 1% can be found; and only aiming at exome areas, the sequencing cost, the storage space and the workload are effectively reduced. And the data information can be deeply utilized by using the eGRS method, so that the accuracy of the model is improved.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 eGRS score boxplot for stable and acute aggravated individuals;
FIG. 3 is a ROC plot predicted for a validation set sample.
Detailed Description
The invention is described in further detail below with reference to the figures and the detailed description. The following examples or figures are illustrative of the present invention and are not intended to limit the scope of the present invention.
As shown in fig. 1, the invention is realized by the following technical scheme: detecting exon SNP genotype data of individuals with acute exacerbation of the slow obstructive pulmonary disease and individuals with non-acute exacerbation of the slow obstructive pulmonary disease, screening SNP which is obviously related to the acute exacerbation of the slow obstructive pulmonary disease, carrying out 0, 1 and 2 scores according to the number of risk alleles in the data, further obtaining SNP which has independent influence on the acute exacerbation of the slow obstructive pulmonary disease and corresponding OR value, then establishing an improved eGRS model by utilizing the genetic information data, and finally carrying out the prediction of the acute exacerbation of the slow obstructive pulmonary disease by using the model.
Specifically, the risk model of the present invention is obtained as follows:
1. acquiring exon genotype data of individuals with acute exacerbation of chronic obstructive pulmonary disease and individuals with non-acute exacerbation of chronic obstructive pulmonary disease;
2. transforming SNP data
One allele in a SNP is artificially selected as the risk allele (usually the allele with low frequency is selected as the risk allele) and the corresponding other allele is the reference allele. The homozygotes containing two risk alleles were scored as 2, the heterozygotes containing one allele and one non-risk allele were scored as 1, the homozygotes containing no risk allele were scored as 0, and the scores were scored as Gi。
3. Screening to obtain SNP (Single nucleotide polymorphism) obviously related to chronic obstructive pulmonary acute exacerbation and corresponding OR (OR) value thereof
SNPs (p < 0.01) and their corresponding OR values, which were significantly associated with chronic obstructive pulmonary acute exacerbations, were obtained by logistic regression.
4. Establishing corresponding eGRS model
Taking the natural logarithm of the corresponding OR value of each SNP to obtain the weight beta of each variableiImproved eGRS is expressed as the risk allele factor G for each SNPiWeight β corresponding to SNPiThe sum of the products. The model for the risk of chronic obstructive pulmonary acute exacerbation is then found to be eGRS ═ sum (. beta.)iGi)。
5. Prediction of risk of acute exacerbation of chronic obstructive pulmonary disease
And (3) carrying out the prediction of the risk of the chronic obstructive pulmonary acute exacerbation on the individual, and calculating the risk of the human chronic obstructive pulmonary acute exacerbation by using the model only by measuring the exonic genotype data of the individual.
The exon SNP data used by the modeling of the inventor is from 72 patients in Beijing together with hospital respiratory medicine slow obstructive pulmonary hospitalization, wherein 30 patients in the stationary phase and 42 patients in the acute exacerbation phase are obtained. The 72 people were randomly divided into two groups, training set 52 people and validation set 20 people. Collecting all 72 persons blood detection exon SNP genotype data, and carrying out quality control on the blood exon SNP genotype data of 52 persons in the training set to remove unqualified SNPs, wherein the quality control comprises the following steps: the detection rate of single SNP is lower than 95%, and the detection rate of individual SNP is not lower than 90%. The exon SNPs of 52 persons were scored as G by 0, 1, 2 score conversion based on the number of risk alleles (usually, alleles with low selection frequency were selected as risk alleles)iAnd is divided back by single factor logistic regressionAnalyzing to obtain exon SNP obviously related to chronic obstructive pulmonary acute exacerbation, corresponding OR value and 95% confidence interval, and obtaining the corresponding weight beta of each exon SNP by taking the natural logarithm of the OR valuei. 90 exon SNPs associated with acute exacerbation of chronic obstructive pulmonary disease were obtained as shown in table 1:
TABLE 1 90 SNPs significantly associated with chronic obstructive pulmonary acute exacerbation
An improved egr model was established using 52 human exon SNP data significantly associated with chronic obstructive pulmonary acute exacerbations: ergrs ═ sum (. beta.) (β)iGi). Based on the eGRS model, we used a linear regression model for diagnostic prediction between stationary phase COPD and acutely exacerbating COPD.
To test the success of this model set-up, we analyzed the validation set of 20 exon SNP data using modified eGRS to obtain the eGRS scores (table 2) and prediction results (fig. 3) for 20 patients with chronic obstructive pulmonary acute exacerbation, and the corresponding ROC curve is shown in fig. 3, with an area of 0.740, suggesting better prediction efficacy.
Table 2 partial individual GRS score results
Wherein, A represents a patient with chronic obstructive pulmonary acute exacerbation (n ═ 20), and S represents a patient with chronic obstructive pulmonary non-acute exacerbation (n ═ 16).
The accuracy of the improved eGRS model for predicting the risk of the chronic obstructive pulmonary disease acute exacerbation of the patient in the verification set is 74%, which shows that the model has great potential in predicting whether the chronic obstructive pulmonary disease patient can have acute exacerbation.
In summary, the embodiment of the present invention provides an improved method for predicting the risk of acute exacerbation of chronic obstructive pulmonary disease, which combines GRS and WES based on the existing GRS method, and provides an improved eGRS method, wherein the sequencing depth is higher by using the whole exon sequencing technology, and rare variation with variation frequency lower than 1% can be found; and only aiming at exome areas, the sequencing cost, the storage space and the workload are effectively reduced. And the data information can be deeply utilized by using the eGRS method, so that the accuracy of the model is improved.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification made within the spirit and principles of the present invention. Equivalents, modifications, etc. are intended to be included within the scope of the present invention.