CN117747093B - Method for constructing idiopathic pulmonary fibrosis diagnosis model and diagnosis system - Google Patents
Method for constructing idiopathic pulmonary fibrosis diagnosis model and diagnosis system Download PDFInfo
- Publication number
- CN117747093B CN117747093B CN202410189821.8A CN202410189821A CN117747093B CN 117747093 B CN117747093 B CN 117747093B CN 202410189821 A CN202410189821 A CN 202410189821A CN 117747093 B CN117747093 B CN 117747093B
- Authority
- CN
- China
- Prior art keywords
- genes
- chip data
- module
- diagnosis
- screening
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention relates to the technical field of genes, in particular to a construction method and a diagnosis system of an idiopathic pulmonary fibrosis diagnosis model; the system comprises a data acquisition module, a differential gene screening module, a characteristic gene screening module, a regression coefficient calculation module, a diagnosis model construction module and a diagnosis score calculation module, wherein the data acquisition module is used for acquiring gene expression profile chip data of an IPF patient through a GEO database, constructing a chip data training set, the differential gene screening module is used for screening differential genes through Bayesian inspection by utilizing the chip data training set, the characteristic gene screening module is used for screening characteristic genes based on a random forest classifier, the regression coefficient calculation module is used for fitting a logistic regression model in the training set based on the characteristic genes to obtain regression coefficients of the characteristic genes, the diagnosis model construction module is used for constructing a idiopathic pulmonary fibrosis diagnosis model, and the diagnosis module is used for calculating a diagnosis score through the idiopathic pulmonary fibrosis diagnosis model based on the expression quantity of the characteristic genes of a person to be detected; realizes rapid screening of idiopathic pulmonary fibrosis, realizes earlier, more accurate and simpler diagnosis of IPF, and improves prognosis.
Description
Technical Field
The invention relates to the technical field of genes, in particular to a construction method and a diagnosis system of an idiopathic pulmonary fibrosis diagnosis model.
Background
Pulmonary fibrosis is the terminal change of a large class of pulmonary diseases characterized by proliferation of fibroblasts and aggregation of a large number of extracellular matrices with concomitant inflammatory injury and destruction of tissue architecture, i.e., abnormal repair of normal alveolar tissue after it has been damaged, leading to structural abnormalities (scar formation). The etiology of most patients with pulmonary fibrosis is unknown (idiopathic), and this group of diseases is called idiopathic interstitial pneumonia (Idiopathic Interstitial Pneumonia), which is a major category of interstitial pneumonia. The most common type of disease with pulmonary fibrosis lesions as the main manifestation in Idiopathic Interstitial Pneumonia (IIP) is idiopathic pulmonary fibrosis (Idiopathic Pulmonary Fibrosis), which is a serious interstitial lung disease that can lead to progressive loss of pulmonary function. Pulmonary fibrosis severely affects the respiratory function of the human body, manifesting as dry cough, progressive dyspnea (inadequate consciousness), and with increased disease and lung injury, patient respiratory function is continually worsened. The incidence and mortality of idiopathic pulmonary fibrosis increases year by year, with an average survival of only 2.8 years after diagnosis, and mortality higher than most tumors, known as a "neoplastic disease".
IPF diagnosis requires High Resolution CT (HRCT) and lung biopsy is required for some cases. IPF is often ignored at the time of initial diagnosis because it is clinically very similar to other diseases such as bronchitis, asthma and heart failure. Most patients have reached mid-to-late stage disease at the time of diagnosis, even though the treatment is worsening.
Therefore, it is highly desirable to build a diagnostic model that facilitates earlier, more accurate, simpler diagnosis of IPF, improving prognosis.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a refining method of vulcanized isobutene, which solves the problems of high corrosion level of copper and odor of products by adopting high corrosion components in the existing preparation method.
In order to solve the problems, the invention adopts the following technical scheme:
The idiopathic pulmonary fibrosis diagnosis system comprises a data acquisition module, a differential gene screening module, a characteristic gene screening module, a regression coefficient calculation module, a diagnosis model construction module and a diagnosis module;
The data acquisition module is used for acquiring the gene expression profile chip data of the IPF patient through the GEO database and constructing a chip data training set;
The differential gene screening module is used for analyzing differential expression genes of the IPF and control groups by using a chip data training set through Bayesian test, wherein the screening condition is p.adj <0.05 &| logFC | >0.5, and screening out differential genes;
the characteristic gene screening module is used for screening the differential genes based on a random forest classifier to obtain characteristic genes;
The regression coefficient calculation module is used for fitting a logistic regression model in a training set based on the characteristic genes to obtain regression coefficients of the characteristic genes;
the diagnosis model construction module is used for constructing an idiopathic pulmonary fibrosis diagnosis model according to the expression quantity of the characteristic genes and the regression coefficient thereof;
the diagnosis module is used for calculating a diagnosis score through the idiopathic pulmonary fibrosis diagnosis model based on the expression quantity of the characteristic genes of the person to be detected.
As an implementation manner, the obtaining the IPF patient gene expression profile chip data through the GEO database, and constructing the chip data training set includes:
Acquiring gene expression profile chip data of an IPF patient from a GEO database, wherein the gene expression profile chip data of the IPF patient comprises GSE132607, GSE38958 and GSE28221; the gene expression profile chip data of the IPF patient are subjected to log2 conversion, probes of the IPF patient are annotated, GSE132607 and GSE38958 are combined, the batch effect of the combined data is removed by using a removeBatchEffect function in R package limma, and a chip data training set is obtained through integration, and GSE28221 is used as a verification set.
As an embodiment, the screening the differential genes based on a random forest classifier includes:
The variable of the random forest classifier is set to be 18, the number of trees for calculating error rate is 1000, and characteristic genes with importance larger than 1 are screened out, wherein the characteristic genes comprise TLR10、GZMK、CD79A、NOG、P2RY10、KLRB1、N6AMT1、EIF1AX、GCNT4、FCRLA、CD40LG、CD69、ABCA13、RNASE3、CEACAM6、USP9Y、OLFM4、BPI、UTY、RPS4Y1 and DDX3Y.
As an embodiment, the regression coefficients of the characteristic genes: TLR10 is-1.34326, GZMK is 0.30552, CD79A is-1.1445, NOG is-0.54664, P2RY10 is 0.74687, KLRB1 is-1.56761, N6AMT1 is-0.83828, EIF1AX is-0.51833, GCNT4 is-0.27993, FCRLA is 1.70369, CD40LG is-2.0675, CD69 is 0.97147, ABCA13 is 1.54353, RNASE3 is-0.08833, CEACAM6 is 0.48974, USP9Y is 0.97489, OLFM4 is-0.49965, BPI is 0.39967, UTY is 1.07367, RPS4Y1 is-1.06128 and DDX3Y is 0.07629.
As an embodiment, the diagnostic model of idiopathic pulmonary fibrosis calculates a diagnostic score by:
1/(1+exp (-z), where ,z =[(-1.34326×TLR10)+(0.30552×GZMK)+(-1.1445×CD79A)+(-0.54664×NOG)+(0.74687×P2RY10)+(-1.56761×KLRB1)+(-0.83828×N6AMT1)+(-0.51833×EIF1AX)+(-0.27993×GCNT4)+(1.70369×FCRLA)+(-2.0675×CD40LG)+(0.97147×CD69)+(1.54353×ABCA13)+(-0.08833×RNASE3)+(0.48974×CEACAM6)+(0.97489×USP9Y)+(-0.49965×OLFM4)+(0.39967×BPI)+(1.07367×UTY)+(-1.06128×RPS4Y1)+(0.07629×DDX3Y)+32.73267]/10.
A method for constructing a diagnostic model of idiopathic pulmonary fibrosis comprises
Acquiring gene expression profile chip data of an IPF patient through a GEO database, and constructing a chip data training set;
Using a chip data training set to analyze differential expression genes of IPF and control groups through Bayesian test, wherein the screening condition is p.adj <0.05 +| logFC | >0.5, and screening out differential genes;
Screening out characteristic genes of the differential genes based on a random forest classifier;
fitting a logistic regression model in a training set based on the characteristic genes to obtain regression coefficients of the characteristic genes;
and constructing a idiopathic pulmonary fibrosis diagnosis model according to the expression quantity of the characteristic genes and the regression coefficient thereof.
As an implementation manner, the obtaining the IPF patient gene expression profile chip data through the GEO database, and constructing the chip data training set includes:
Acquiring gene expression profile chip data of an IPF patient from a GEO database, wherein the gene expression profile chip data of the IPF patient comprises GSE132607, GSE38958 and GSE28221; the gene expression profile chip data of the IPF patient are subjected to log2 conversion, probes of the IPF patient are annotated, GSE132607 and GSE38958 are combined, the batch effect of the combined data is removed by using a removeBatchEffect function in R package limma, and a chip data training set is obtained through integration, and GSE28221 is used as a verification set.
As an embodiment, the screening the differential genes based on a random forest classifier includes:
The variable of the random forest classifier is set to be 18, the number of trees for calculating error rate is 1000, and characteristic genes with importance larger than 1 are screened out, wherein the characteristic genes comprise TLR10、GZMK、CD79A、NOG、P2RY10、KLRB1、N6AMT1、EIF1AX、GCNT4、FCRLA、CD40LG、CD69、ABCA13、RNASE3、CEACAM6、USP9Y、OLFM4、BPI、UTY、RPS4Y1 and DDX3Y.
As an embodiment, the regression coefficients of the characteristic genes: TLR10 is-1.34326, GZMK is 0.30552, CD79A is-1.1445, NOG is-0.54664, P2RY10 is 0.74687, KLRB1 is-1.56761, N6AMT1 is-0.83828, EIF1AX is-0.51833, GCNT4 is-0.27993, FCRLA is 1.70369, CD40LG is-2.0675, CD69 is 0.97147, ABCA13 is 1.54353, RNASE3 is-0.08833, CEACAM6 is 0.48974, USP9Y is 0.97489, OLFM4 is-0.49965, BPI is 0.39967, UTY is 1.07367, RPS4Y1 is-1.06128 and DDX3Y is 0.07629.
As an embodiment, the diagnostic model of idiopathic pulmonary fibrosis calculates a diagnostic score by:
1/(1+exp (-z)), where ,z =[(-1.34326×TLR10)+(0.30552×GZMK)+(-1.1445×CD79A)+(-0.54664×NOG)+(0.74687×P2RY10)+(-1.56761×KLRB1)+(-0.83828×N6AMT1)+(-0.51833×EIF1AX)+(-0.27993×GCNT4)+(1.70369×FCRLA)+(-2.0675×CD40LG)+(0.97147×CD69)+(1.54353×ABCA13)+(-0.08833×RNASE3)+(0.48974×CEACAM6)+(0.97489×USP9Y)+(-0.49965×OLFM4)+(0.39967×BPI)+(1.07367×UTY)+(-1.06128×RPS4Y1)+(0.07629×DDX3Y)+32.73267]/10.
The invention has the beneficial effects that: through the diagnosis model or the diagnosis system constructed by the invention, the characteristic genes and the regression coefficients thereof are obtained by screening the difference genes of the IPF and then aiming at the difference genes through a random forest classifier, and the IPF calculation is carried out through the constructed diagnosis model, so that the rapid screening of idiopathic pulmonary fibrosis is realized, the diagnosis of the IPF is carried out earlier, more accurately and more simply, and the prognosis is improved.
Drawings
FIG. 1 is a schematic diagram of an exemplary system for diagnosing idiopathic pulmonary fibrosis.
FIG. 2 is a graph of differential expression volcanic according to an embodiment of the present invention.
FIG. 3 is a thermal map of the expression level of 38 differential genes in a sample according to the embodiment of the present invention.
FIG. 4 is a scatter plot of 38 differential genes in an example of the present invention.
FIG. 5 is a graph showing the number of decision trees selected and the error rate according to an embodiment of the present invention.
FIG. 6 is a view showing the importance of genes in the examples of the present invention.
Fig. 7 is a ROC curve of a training set in an embodiment of the present invention.
Fig. 8 is a ROC curve of a validation set in an embodiment of the present invention.
FIG. 9 is a flowchart of a method for constructing a diagnostic model of idiopathic pulmonary fibrosis in an embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples.
It should be noted that these examples are only for illustrating the present invention, and not for limiting the present invention, and simple modifications of the method under the premise of the inventive concept are all within the scope of the claimed invention.
Referring to fig. 1, a diagnosis system for idiopathic pulmonary fibrosis includes a data acquisition module 100, a differential gene screening module 200, a characteristic gene screening module 300, a regression coefficient calculation module 400, a diagnosis model construction module 500, and a diagnosis module 600;
the data acquisition module 100 is used for acquiring the gene expression profile chip data of the IPF patient through the GEO database and constructing a chip data training set.
The method comprises the steps of obtaining IPF patient gene expression profile chip data through a GEO database, and constructing a chip data training set, wherein the method comprises the following steps:
Obtaining gene expression profile chip data of an IPF patient from a GEO database, wherein the gene expression profile chip data of the IPF patient are shown in table 1 and comprise GSE132607, GSE38958 and GSE28221; the gene expression profile chip data of the IPF patient are subjected to log2 conversion, probes of the IPF patient are annotated, GSE132607 and GSE38958 are combined, the batch effect of the combined data is removed by using a removeBatchEffect function in R package limma, and a chip data training set is obtained through integration, and GSE28221 is used as a verification set.
TABLE 1 GEO database IPFmRNA expression profiling chip data
Data set | IPF | Normal state | Platform | Tissue of |
GSE132607 | 276 | 0 | Gene Expression Array | PBMC |
GSE38958 | 70 | 45 | Gene Expression Array | PBMC |
GSE28221 | 120 | 19 | Gene Expression Array | PBMC |
The differential gene screening module 200 is used for analyzing differential expression genes of IPF and control groups by using a chip data training set through Bayesian test, screening conditions are p.adj <0.05 &| logFC | >0.5, and screening out differential genes. 38 differential genes were selected, see FIGS. 2 and 3.
The characteristic gene screening module 300 is used for screening the differential genes based on a random forest classifier.
Specifically, 38 difference genes of the chip data combined by GSE132607 and GSE38958 are put into a random forest classifier, in order to find the optimal variable number, random forest classification calculation is performed on all possible variable number cycles, the corresponding error rate is obtained, and finally 18 is selected as the optimal variable number, as shown in figure 4; and calculating the error rate of 1-2000 trees, wherein the error rate is not changed after the number of the trees reaches 1000, see fig. 5. Finally we select 18 variables, 1000 trees as the last calculated parameters. Subsequently we screened 21 signature genes with importance greater than 1, see figure 6.
The regression coefficient calculation module 400 is configured to fit a logistic regression model to the training set based on the characteristic genes, so as to obtain regression coefficients of the characteristic genes, as shown in table 2.
TABLE 2 regression coefficient table of characteristic genes
Characteristic gene | Regression coefficient | P value |
Intercept | 32.73267 | 0.000102 |
TLR10 | -1.34326 | 0.013703 |
GZMK | 0.30552 | 0.045683 |
CD79A | -1.1445 | 0.063288 |
NOG | -0.54664 | 0.031332 |
P2RY10 | 0.74687 | 0.024423 |
KLRB1 | -1.56761 | 0.009573 |
N6AMT1 | -0.83828 | 0.020989 |
EIF1AX | -0.51833 | 0.046240 |
GCNT4 | -0.27993 | 0.068536 |
FCRLA | 1.70369 | 0.028434 |
CD40LG | -2.0675 | 0.017964 |
CD69 | 0.97147 | 0.033659 |
ABCA13 | 1.54353 | 0.042329 |
RNASE3 | -0.08833 | 0.065271 |
CEACAM6 | 0.48974 | 0.046443 |
USP9Y | 0.97489 | 0.025125 |
OLFM4 | -0.49965 | 0.038818 |
BPI | 0.39967 | 0.046291 |
UTY | 1.07367 | 0.014165 |
RPS4Y1 | -1.06128 | 0.05182 |
DDX3Y | 0.07629 | 0.047289 |
The diagnostic model construction module 500 is used for constructing a idiopathic pulmonary fibrosis diagnostic model according to the expression level of the characteristic genes and regression coefficients thereof.
The diagnosis module 600 is used for calculating a diagnosis score by the idiopathic pulmonary fibrosis diagnosis model based on the expression level of the characteristic gene of the subject.
The diagnostic model of idiopathic pulmonary fibrosis calculates a diagnostic score by:
1/(1+exp (-z), where ,z =[(-1.34326×TLR10)+(0.30552×GZMK)+(-1.1445×CD79A)+(-0.54664×NOG)+(0.74687×P2RY10)+(-1.56761×KLRB1)+(-0.83828×N6AMT1)+(-0.51833×EIF1AX)+(-0.27993×GCNT4)+(1.70369×FCRLA)+(-2.0675×CD40LG)+(0.97147×CD69)+(1.54353×ABCA13)+(-0.08833×RNASE3)+(0.48974×CEACAM6)+(0.97489×USP9Y)+(-0.49965×OLFM4)+(0.39967×BPI)+(1.07367×UTY)+(-1.06128×RPS4Y1)+(0.07629×DDX3Y)+32.73267]/10.
The predictive power of the ROC curve test model is used, and the AUC >0.7 is generally considered to be good for discrimination. The ROC curve of the training set is shown in fig. 7, the maximum sign index of the training set is 0.656, the area under the ROC curve AUC is 0.893 (95% ci 0.845-0.941), the optimal cut-off value is 0.875, and the sensitivity is 0.723 and the specificity is 0.933 under the cut-off value; the ROC curve part of the verification set is shown as 8, and the model has good prediction capability.
Referring to FIG. 9, a method for constructing a diagnostic model of idiopathic pulmonary fibrosis is provided, comprising
S100, acquiring gene expression profile chip data of an IPF patient through a GEO database, and constructing a chip data training set;
S200, analyzing differential expression genes of IPF and control groups by using a chip data training set through Bayesian test, wherein the screening condition is p.adj <0.05 &| logFC | >0.5, and screening out the differential genes;
s300, screening out characteristic genes from the differential genes based on a random forest classifier;
S400, fitting a logistic regression model in a training set based on the characteristic genes to obtain regression coefficients of the characteristic genes;
s500, constructing a idiopathic pulmonary fibrosis diagnosis model according to the expression quantity of the characteristic genes and the regression coefficients of the characteristic genes.
The method comprises the steps of obtaining IPF patient gene expression profile chip data through a GEO database, and constructing a chip data training set, wherein the method comprises the following steps:
Acquiring gene expression profile chip data of an IPF patient from a GEO database, wherein the gene expression profile chip data of the IPF patient comprises GSE132607, GSE38958 and GSE28221; the gene expression profile chip data of the IPF patient are subjected to log2 conversion, probes of the IPF patient are annotated, GSE132607 and GSE38958 are combined, the batch effect of the combined data is removed by using a removeBatchEffect function in R package limma, and a chip data training set is obtained through integration, and GSE28221 is used as a verification set.
Wherein, screening the differential genes out characteristic genes based on a random forest classifier comprises the following steps:
The variable of the random forest classifier is set to be 18, the number of trees for calculating error rate is 1000, and characteristic genes with importance larger than 1 are screened out, wherein the characteristic genes comprise TLR10、GZMK、CD79A、NOG、P2RY10、KLRB1、N6AMT1、EIF1AX、GCNT4、FCRLA、CD40LG、CD69、ABCA13、RNASE3、CEACAM6、USP9Y、OLFM4、BPI、UTY、RPS4Y1 and DDX3Y.
Wherein, regression coefficients of each characteristic gene: TLR10 is-1.34326, GZMK is 0.30552, CD79A is-1.1445, NOG is-0.54664, P2RY10 is 0.74687, KLRB1 is-1.56761, N6AMT1 is-0.83828, EIF1AX is-0.51833, GCNT4 is-0.27993, FCRLA is 1.70369, CD40LG is-2.0675, CD69 is 0.97147, ABCA13 is 1.54353, RNASE3 is-0.08833, CEACAM6 is 0.48974, USP9Y is 0.97489, OLFM4 is-0.49965, BPI is 0.39967, UTY is 1.07367, RPS4Y1 is-1.06128 and DDX3Y is 0.07629.
Wherein the diagnostic model of idiopathic pulmonary fibrosis calculates a diagnostic score by:
1/(1+exp (-z)), where ,z =[(-1.34326×TLR10)+(0.30552×GZMK)+(-1.1445×CD79A)+(-0.54664×NOG)+(0.74687×P2RY10)+(-1.56761×KLRB1)+(-0.83828×N6AMT1)+(-0.51833×EIF1AX)+(-0.27993×GCNT4)+(1.70369×FCRLA)+(-2.0675×CD40LG)+(0.97147×CD69)+(1.54353×ABCA13)+(-0.08833×RNASE3)+(0.48974×CEACAM6)+(0.97489×USP9Y)+(-0.49965×OLFM4)+(0.39967×BPI)+(1.07367×UTY)+(-1.06128×RPS4Y1)+(0.07629×DDX3Y)+32.73267]/10.
Finally, it is noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be understood that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (4)
1. The idiopathic pulmonary fibrosis diagnosis system is characterized by comprising a data acquisition module, a differential gene screening module, a characteristic gene screening module, a regression coefficient calculation module, a diagnosis model construction module and a diagnosis module;
The data acquisition module is used for acquiring the gene expression profile chip data of the IPF patient through the GEO database and constructing a chip data training set;
The differential gene screening module is used for analyzing differential expression genes of the IPF and control groups by using a chip data training set through Bayesian test, wherein the screening condition is p.adj <0.05 &| logFC | >0.5, and screening out differential genes;
the characteristic gene screening module is used for screening the differential genes based on a random forest classifier to obtain characteristic genes;
The regression coefficient calculation module is used for fitting a logistic regression model in a training set based on the characteristic genes to obtain regression coefficients of the characteristic genes;
the diagnosis model construction module is used for constructing an idiopathic pulmonary fibrosis diagnosis model according to the expression quantity of the characteristic genes and the regression coefficient thereof;
the diagnosis module is used for calculating a diagnosis score through the idiopathic pulmonary fibrosis diagnosis model based on the expression quantity of the characteristic genes of the person to be detected;
The method for obtaining the gene expression profile chip data of the IPF patient through the GEO database and constructing the chip data training set comprises the following steps:
acquiring gene expression profile chip data of an IPF patient from a GEO database, wherein the gene expression profile chip data of the IPF patient comprises GSE132607, GSE38958 and GSE28221; the method comprises the steps of performing log2 conversion on gene expression profile chip data of an IPF patient, annotating probes of the IPF patient, combining GSE132607 and GSE38958, removing batch effects of the combined data by using a removeBatchEffect function in R-packet limma, and integrating to obtain a chip data training set, wherein GSE28221 is used as a verification set;
Screening the differential genes based on a random forest classifier to obtain characteristic genes, wherein the screening comprises the following steps:
the variable of the random forest classifier is set to be 18, the number of trees for calculating error rate is 1000, and characteristic genes with importance larger than 1 are screened out, wherein the characteristic genes comprise TLR10、GZMK、CD79A、NOG、P2RY10、KLRB1、N6AMT1、EIF1AX、GCNT4、FCRLA、CD40LG、CD69、ABCA13、RNASE3、CEACAM6、USP9Y、OLFM4、BPI、UTY、RPS4Y1 and DDX3Y;
Regression coefficients of the characteristic genes: TLR10 is-1.34326, GZMK is 0.30552, CD79A is-1.1445, NOG is-0.54664, P2RY10 is 0.74687, KLRB1 is-1.56761, N6AMT1 is-0.83828, EIF1AX is-0.51833, GCNT4 is-0.27993, FCRLA is 1.70369, CD40LG is-2.0675, CD69 is 0.97147, ABCA13 is 1.54353, RNASE3 is-0.08833, CEACAM6 is 0.48974, USP9Y is 0.97489, OLFM4 is-0.49965, BPI is 0.39967, UTY is 1.07367, RPS4Y1 is-1.06128 and DDX3Y is 0.07629.
2. The idiopathic pulmonary fibrosis diagnostic system of claim 1, wherein the idiopathic pulmonary fibrosis diagnostic model calculates a diagnostic score by:
1/(1+exp(-z)),
wherein z= [ (-1.34326 x TLR 10) + (0.30552 x GZMK) +(-1.1445 x CD 79A) +(-0.5466)
4×NOG)+(0.74687×P2RY10)+(-1.56761×KLRB1)+(-0.83828×N6AMT1)+(-0.51833×EIF1AX)+(-0.27993×GCNT4)+(1.70369×FCRLA)+(-2.0675×CD40LG)+(0.97147×CD69)+(1.54353×ABCA13)+(-0.08833×RNASE3)+(0.48974×CEACAM6)+(0.97489×USP9Y)+(-0.49965×OLFM4)+(0.39967×BPI)+(1.07367×UTY)+(-1.06128×RPS4Y1)+(0.07629×DDX3Y)+32.73267]/10.
3. A method for constructing a diagnosis model of idiopathic pulmonary fibrosis is characterized by comprising the following steps of
Acquiring gene expression profile chip data of an IPF patient through a GEO database, and constructing a chip data training set;
Using a chip data training set to analyze differential expression genes of IPF and control groups through Bayesian test, wherein the screening condition is p.adj <0.05 +| logFC | >0.5, and screening out differential genes;
Screening out characteristic genes of the differential genes based on a random forest classifier;
fitting a logistic regression model in a training set based on the characteristic genes to obtain regression coefficients of the characteristic genes;
Constructing a idiopathic pulmonary fibrosis diagnosis model according to the expression quantity of the characteristic genes and the regression coefficient of the idiopathic pulmonary fibrosis diagnosis model;
The method for obtaining the gene expression profile chip data of the IPF patient through the GEO database and constructing the chip data training set comprises the following steps:
acquiring gene expression profile chip data of an IPF patient from a GEO database, wherein the gene expression profile chip data of the IPF patient comprises GSE132607, GSE38958 and GSE28221; the method comprises the steps of performing log2 conversion on gene expression profile chip data of an IPF patient, annotating probes of the IPF patient, combining GSE132607 and GSE38958, removing batch effects of the combined data by using a removeBatchEffect function in R-packet limma, and integrating to obtain a chip data training set, wherein GSE28221 is used as a verification set;
Screening the differential genes based on a random forest classifier to obtain characteristic genes, wherein the screening comprises the following steps:
the variable of the random forest classifier is set to be 18, the number of trees for calculating error rate is 1000, and characteristic genes with importance larger than 1 are screened out, wherein the characteristic genes comprise TLR10、GZMK、CD79A、NOG、P2RY10、KLRB1、N6AMT1、EIF1AX、GCNT4、FCRLA、CD40LG、CD69、ABCA13、RNASE3、CEACAM6、USP9Y、OLFM4、BPI、UTY、RPS4Y1 and DDX3Y;
Regression coefficients of the characteristic genes: TLR10 is-1.34326, GZMK is 0.30552, CD79A is-1.1445, NOG is-0.54664, P2RY10 is 0.74687, KLRB1 is-1.56761, N6AMT1 is-0.83828, EIF1AX is-0.51833, GCNT4 is-0.27993, FCRLA is 1.70369, CD40LG is-2.0675, CD69 is 0.97147, ABCA13 is 1.54353, RNASE3 is-0.08833, CEACAM6 is 0.48974, USP9Y is 0.97489, OLFM4 is-0.49965, BPI is 0.39967, UTY is 1.07367, RPS4Y1 is-1.06128 and DDX3Y is 0.07629.
4. The method of constructing an idiopathic pulmonary fibrosis diagnostic model of claim 3, wherein the idiopathic pulmonary fibrosis diagnostic model calculates a diagnostic score by:
1/(1+exp(-z));
wherein z= [ (-1.34326 x TLR 10) + (0.30552 x GZMK) +(-1.1445 x CD 79A) +(-0.54)
664×NOG)+(0.74687×P2RY10)+(-1.56761×KLRB1)+(-0.83828×N6AMT1)+(-0.51833×EIF1AX)+(-0.27993×GCNT4)+(1.70369×FCRLA)+(-2.0675×CD40LG)+(0.97147×CD69)+(1.54353×ABCA13)+(-0.08833×RNASE3)+(0.48974×CEACAM6)+(0.97489×USP9Y)+(-0.49965×OLFM4)+(0.39967×BPI)+(1.07367×UTY)+(-1.06128×RPS4Y1)+(0.07629×DDX3Y)+32.73267]/10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410189821.8A CN117747093B (en) | 2024-02-20 | 2024-02-20 | Method for constructing idiopathic pulmonary fibrosis diagnosis model and diagnosis system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410189821.8A CN117747093B (en) | 2024-02-20 | 2024-02-20 | Method for constructing idiopathic pulmonary fibrosis diagnosis model and diagnosis system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117747093A CN117747093A (en) | 2024-03-22 |
CN117747093B true CN117747093B (en) | 2024-06-07 |
Family
ID=90251206
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410189821.8A Active CN117747093B (en) | 2024-02-20 | 2024-02-20 | Method for constructing idiopathic pulmonary fibrosis diagnosis model and diagnosis system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117747093B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014144564A2 (en) * | 2013-03-15 | 2014-09-18 | Veracyte, Inc. | Biomarkers for diagnosis of lung diseases and methods of use thereof |
CN107099581A (en) * | 2012-03-27 | 2017-08-29 | 弗·哈夫曼-拉罗切有限公司 | The method of prediction, diagnosis and treatment idiopathic pulmonary fibrosis |
CN114283885A (en) * | 2021-12-25 | 2022-04-05 | 重庆医科大学 | Method for constructing diagnosis model of prostate cancer |
CN114864003A (en) * | 2022-03-17 | 2022-08-05 | 中国科学院深圳先进技术研究院 | Differential analysis method and system based on single cell samples of mixed experimental group and control group |
CN115261454A (en) * | 2022-04-20 | 2022-11-01 | 合肥市传染病医院(合肥市第六人民医院) | Novel let-7d-5p and miR-140-5p biomarker panel diagnosis method |
CN117497062A (en) * | 2023-11-15 | 2024-02-02 | 广州瑞能精准医学科技有限公司 | Method for constructing idiopathic pulmonary fibrosis plasma cell characteristic gene prognosis model |
-
2024
- 2024-02-20 CN CN202410189821.8A patent/CN117747093B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107099581A (en) * | 2012-03-27 | 2017-08-29 | 弗·哈夫曼-拉罗切有限公司 | The method of prediction, diagnosis and treatment idiopathic pulmonary fibrosis |
WO2014144564A2 (en) * | 2013-03-15 | 2014-09-18 | Veracyte, Inc. | Biomarkers for diagnosis of lung diseases and methods of use thereof |
CN114283885A (en) * | 2021-12-25 | 2022-04-05 | 重庆医科大学 | Method for constructing diagnosis model of prostate cancer |
CN114864003A (en) * | 2022-03-17 | 2022-08-05 | 中国科学院深圳先进技术研究院 | Differential analysis method and system based on single cell samples of mixed experimental group and control group |
CN115261454A (en) * | 2022-04-20 | 2022-11-01 | 合肥市传染病医院(合肥市第六人民医院) | Novel let-7d-5p and miR-140-5p biomarker panel diagnosis method |
CN117497062A (en) * | 2023-11-15 | 2024-02-02 | 广州瑞能精准医学科技有限公司 | Method for constructing idiopathic pulmonary fibrosis plasma cell characteristic gene prognosis model |
Non-Patent Citations (2)
Title |
---|
"特发性肺纤维化的基因学筛查研究及其急性加重机制初探";范珊珊;《中国优秀硕士学位论文全文数据库(医药卫生科技辑)》;20210815(第08期);第E063-13页 * |
特发性肺纤维化相关基因的筛选和生物信息学分析;邢静;黄鑫炎;郭禹标;;中山大学学报(医学科学版);20171115(第06期);第131-135+142页 * |
Also Published As
Publication number | Publication date |
---|---|
CN117747093A (en) | 2024-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112951406A (en) | Lung cancer prognosis auxiliary evaluation method and system based on CT (computed tomography) image omics | |
CN112635056A (en) | Lasso-based esophageal squamous carcinoma patient risk prediction nomogram model establishing method | |
CN112641451B (en) | Multi-scale residual error network sleep staging method and system based on single-channel electroencephalogram signal | |
CN113989833B (en) | EFFICIENTNET network-based oral mucosa disease identification method | |
CN112950614A (en) | Breast cancer detection method based on multi-scale cavity convolution | |
CN110991536A (en) | Training method of early warning model of primary liver cancer | |
CN111833330B (en) | Lung cancer intelligent detection method and system based on fusion of image and machine smell | |
CN113593708A (en) | Sepsis prognosis prediction method based on integrated learning algorithm | |
CN117672522A (en) | A method for survival prediction of osteosarcoma based on machine learning model | |
CN114121286A (en) | Exhaled breath detection-based disease risk assessment method and device and related products | |
CN118799648A (en) | A deep learning medical image processing classification system combined with multimodal big data | |
CN117747093B (en) | Method for constructing idiopathic pulmonary fibrosis diagnosis model and diagnosis system | |
CN111748634A (en) | A combination of characteristic lincRNA expression profiles and an early prediction method for colon cancer | |
CN115691813A (en) | Genetic gastric cancer assessment method and system based on genomics and microbiomics | |
CN118538416A (en) | Method for predicting colorectal cancer distant metastasis state | |
CN112690815A (en) | System and method for assisting in diagnosing lesion grade based on lung image report | |
CN118039116A (en) | Method and system for constructing gestational diabetes judgment model based on machine learning | |
CN116628557A (en) | Method and device for assessing organic bowel disease type based on belief rule base reasoning | |
CN113989543B (en) | A COVID-19 medical image detection and classification method and device | |
CN115810122A (en) | SPECT/CT-based deep learning method for detecting activity of thyroid-related ophthalmopathy | |
CN110530816B (en) | Method for early diagnosis of rice blast by using infrared photoacoustic spectrum | |
CN113345525A (en) | Analysis method for reducing influence of covariates on detection result in high-throughput detection | |
CN119991669B (en) | Deep learning-based children pneumonia lung image analysis and evaluation system and method | |
CN112489038A (en) | Fuzzy model breast cancer diagnosis method based on fuzzy clustering and generalized least square method | |
CN114878832B (en) | Idiopathic pulmonary fibrosis plasma protein markers and their use in preparing detection reagents or diagnostic tools |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |