[go: up one dir, main page]

CN117747093B - Method for constructing idiopathic pulmonary fibrosis diagnosis model and diagnosis system - Google Patents

Method for constructing idiopathic pulmonary fibrosis diagnosis model and diagnosis system Download PDF

Info

Publication number
CN117747093B
CN117747093B CN202410189821.8A CN202410189821A CN117747093B CN 117747093 B CN117747093 B CN 117747093B CN 202410189821 A CN202410189821 A CN 202410189821A CN 117747093 B CN117747093 B CN 117747093B
Authority
CN
China
Prior art keywords
genes
chip data
module
diagnosis
screening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410189821.8A
Other languages
Chinese (zh)
Other versions
CN117747093A (en
Inventor
酒连娣
吕彬彬
郭栋梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Digital Health China Technologies Co Ltd
Original Assignee
Digital Health China Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Digital Health China Technologies Co Ltd filed Critical Digital Health China Technologies Co Ltd
Priority to CN202410189821.8A priority Critical patent/CN117747093B/en
Publication of CN117747093A publication Critical patent/CN117747093A/en
Application granted granted Critical
Publication of CN117747093B publication Critical patent/CN117747093B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the technical field of genes, in particular to a construction method and a diagnosis system of an idiopathic pulmonary fibrosis diagnosis model; the system comprises a data acquisition module, a differential gene screening module, a characteristic gene screening module, a regression coefficient calculation module, a diagnosis model construction module and a diagnosis score calculation module, wherein the data acquisition module is used for acquiring gene expression profile chip data of an IPF patient through a GEO database, constructing a chip data training set, the differential gene screening module is used for screening differential genes through Bayesian inspection by utilizing the chip data training set, the characteristic gene screening module is used for screening characteristic genes based on a random forest classifier, the regression coefficient calculation module is used for fitting a logistic regression model in the training set based on the characteristic genes to obtain regression coefficients of the characteristic genes, the diagnosis model construction module is used for constructing a idiopathic pulmonary fibrosis diagnosis model, and the diagnosis module is used for calculating a diagnosis score through the idiopathic pulmonary fibrosis diagnosis model based on the expression quantity of the characteristic genes of a person to be detected; realizes rapid screening of idiopathic pulmonary fibrosis, realizes earlier, more accurate and simpler diagnosis of IPF, and improves prognosis.

Description

Method for constructing idiopathic pulmonary fibrosis diagnosis model and diagnosis system
Technical Field
The invention relates to the technical field of genes, in particular to a construction method and a diagnosis system of an idiopathic pulmonary fibrosis diagnosis model.
Background
Pulmonary fibrosis is the terminal change of a large class of pulmonary diseases characterized by proliferation of fibroblasts and aggregation of a large number of extracellular matrices with concomitant inflammatory injury and destruction of tissue architecture, i.e., abnormal repair of normal alveolar tissue after it has been damaged, leading to structural abnormalities (scar formation). The etiology of most patients with pulmonary fibrosis is unknown (idiopathic), and this group of diseases is called idiopathic interstitial pneumonia (Idiopathic Interstitial Pneumonia), which is a major category of interstitial pneumonia. The most common type of disease with pulmonary fibrosis lesions as the main manifestation in Idiopathic Interstitial Pneumonia (IIP) is idiopathic pulmonary fibrosis (Idiopathic Pulmonary Fibrosis), which is a serious interstitial lung disease that can lead to progressive loss of pulmonary function. Pulmonary fibrosis severely affects the respiratory function of the human body, manifesting as dry cough, progressive dyspnea (inadequate consciousness), and with increased disease and lung injury, patient respiratory function is continually worsened. The incidence and mortality of idiopathic pulmonary fibrosis increases year by year, with an average survival of only 2.8 years after diagnosis, and mortality higher than most tumors, known as a "neoplastic disease".
IPF diagnosis requires High Resolution CT (HRCT) and lung biopsy is required for some cases. IPF is often ignored at the time of initial diagnosis because it is clinically very similar to other diseases such as bronchitis, asthma and heart failure. Most patients have reached mid-to-late stage disease at the time of diagnosis, even though the treatment is worsening.
Therefore, it is highly desirable to build a diagnostic model that facilitates earlier, more accurate, simpler diagnosis of IPF, improving prognosis.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a refining method of vulcanized isobutene, which solves the problems of high corrosion level of copper and odor of products by adopting high corrosion components in the existing preparation method.
In order to solve the problems, the invention adopts the following technical scheme:
The idiopathic pulmonary fibrosis diagnosis system comprises a data acquisition module, a differential gene screening module, a characteristic gene screening module, a regression coefficient calculation module, a diagnosis model construction module and a diagnosis module;
The data acquisition module is used for acquiring the gene expression profile chip data of the IPF patient through the GEO database and constructing a chip data training set;
The differential gene screening module is used for analyzing differential expression genes of the IPF and control groups by using a chip data training set through Bayesian test, wherein the screening condition is p.adj <0.05 &| logFC | >0.5, and screening out differential genes;
the characteristic gene screening module is used for screening the differential genes based on a random forest classifier to obtain characteristic genes;
The regression coefficient calculation module is used for fitting a logistic regression model in a training set based on the characteristic genes to obtain regression coefficients of the characteristic genes;
the diagnosis model construction module is used for constructing an idiopathic pulmonary fibrosis diagnosis model according to the expression quantity of the characteristic genes and the regression coefficient thereof;
the diagnosis module is used for calculating a diagnosis score through the idiopathic pulmonary fibrosis diagnosis model based on the expression quantity of the characteristic genes of the person to be detected.
As an implementation manner, the obtaining the IPF patient gene expression profile chip data through the GEO database, and constructing the chip data training set includes:
Acquiring gene expression profile chip data of an IPF patient from a GEO database, wherein the gene expression profile chip data of the IPF patient comprises GSE132607, GSE38958 and GSE28221; the gene expression profile chip data of the IPF patient are subjected to log2 conversion, probes of the IPF patient are annotated, GSE132607 and GSE38958 are combined, the batch effect of the combined data is removed by using a removeBatchEffect function in R package limma, and a chip data training set is obtained through integration, and GSE28221 is used as a verification set.
As an embodiment, the screening the differential genes based on a random forest classifier includes:
The variable of the random forest classifier is set to be 18, the number of trees for calculating error rate is 1000, and characteristic genes with importance larger than 1 are screened out, wherein the characteristic genes comprise TLR10、GZMK、CD79A、NOG、P2RY10、KLRB1、N6AMT1、EIF1AX、GCNT4、FCRLA、CD40LG、CD69、ABCA13、RNASE3、CEACAM6、USP9Y、OLFM4、BPI、UTY、RPS4Y1 and DDX3Y.
As an embodiment, the regression coefficients of the characteristic genes: TLR10 is-1.34326, GZMK is 0.30552, CD79A is-1.1445, NOG is-0.54664, P2RY10 is 0.74687, KLRB1 is-1.56761, N6AMT1 is-0.83828, EIF1AX is-0.51833, GCNT4 is-0.27993, FCRLA is 1.70369, CD40LG is-2.0675, CD69 is 0.97147, ABCA13 is 1.54353, RNASE3 is-0.08833, CEACAM6 is 0.48974, USP9Y is 0.97489, OLFM4 is-0.49965, BPI is 0.39967, UTY is 1.07367, RPS4Y1 is-1.06128 and DDX3Y is 0.07629.
As an embodiment, the diagnostic model of idiopathic pulmonary fibrosis calculates a diagnostic score by:
1/(1+exp (-z), where ,z =[(-1.34326×TLR10)+(0.30552×GZMK)+(-1.1445×CD79A)+(-0.54664×NOG)+(0.74687×P2RY10)+(-1.56761×KLRB1)+(-0.83828×N6AMT1)+(-0.51833×EIF1AX)+(-0.27993×GCNT4)+(1.70369×FCRLA)+(-2.0675×CD40LG)+(0.97147×CD69)+(1.54353×ABCA13)+(-0.08833×RNASE3)+(0.48974×CEACAM6)+(0.97489×USP9Y)+(-0.49965×OLFM4)+(0.39967×BPI)+(1.07367×UTY)+(-1.06128×RPS4Y1)+(0.07629×DDX3Y)+32.73267]/10.
A method for constructing a diagnostic model of idiopathic pulmonary fibrosis comprises
Acquiring gene expression profile chip data of an IPF patient through a GEO database, and constructing a chip data training set;
Using a chip data training set to analyze differential expression genes of IPF and control groups through Bayesian test, wherein the screening condition is p.adj <0.05 +| logFC | >0.5, and screening out differential genes;
Screening out characteristic genes of the differential genes based on a random forest classifier;
fitting a logistic regression model in a training set based on the characteristic genes to obtain regression coefficients of the characteristic genes;
and constructing a idiopathic pulmonary fibrosis diagnosis model according to the expression quantity of the characteristic genes and the regression coefficient thereof.
As an implementation manner, the obtaining the IPF patient gene expression profile chip data through the GEO database, and constructing the chip data training set includes:
Acquiring gene expression profile chip data of an IPF patient from a GEO database, wherein the gene expression profile chip data of the IPF patient comprises GSE132607, GSE38958 and GSE28221; the gene expression profile chip data of the IPF patient are subjected to log2 conversion, probes of the IPF patient are annotated, GSE132607 and GSE38958 are combined, the batch effect of the combined data is removed by using a removeBatchEffect function in R package limma, and a chip data training set is obtained through integration, and GSE28221 is used as a verification set.
As an embodiment, the screening the differential genes based on a random forest classifier includes:
The variable of the random forest classifier is set to be 18, the number of trees for calculating error rate is 1000, and characteristic genes with importance larger than 1 are screened out, wherein the characteristic genes comprise TLR10、GZMK、CD79A、NOG、P2RY10、KLRB1、N6AMT1、EIF1AX、GCNT4、FCRLA、CD40LG、CD69、ABCA13、RNASE3、CEACAM6、USP9Y、OLFM4、BPI、UTY、RPS4Y1 and DDX3Y.
As an embodiment, the regression coefficients of the characteristic genes: TLR10 is-1.34326, GZMK is 0.30552, CD79A is-1.1445, NOG is-0.54664, P2RY10 is 0.74687, KLRB1 is-1.56761, N6AMT1 is-0.83828, EIF1AX is-0.51833, GCNT4 is-0.27993, FCRLA is 1.70369, CD40LG is-2.0675, CD69 is 0.97147, ABCA13 is 1.54353, RNASE3 is-0.08833, CEACAM6 is 0.48974, USP9Y is 0.97489, OLFM4 is-0.49965, BPI is 0.39967, UTY is 1.07367, RPS4Y1 is-1.06128 and DDX3Y is 0.07629.
As an embodiment, the diagnostic model of idiopathic pulmonary fibrosis calculates a diagnostic score by:
1/(1+exp (-z)), where ,z =[(-1.34326×TLR10)+(0.30552×GZMK)+(-1.1445×CD79A)+(-0.54664×NOG)+(0.74687×P2RY10)+(-1.56761×KLRB1)+(-0.83828×N6AMT1)+(-0.51833×EIF1AX)+(-0.27993×GCNT4)+(1.70369×FCRLA)+(-2.0675×CD40LG)+(0.97147×CD69)+(1.54353×ABCA13)+(-0.08833×RNASE3)+(0.48974×CEACAM6)+(0.97489×USP9Y)+(-0.49965×OLFM4)+(0.39967×BPI)+(1.07367×UTY)+(-1.06128×RPS4Y1)+(0.07629×DDX3Y)+32.73267]/10.
The invention has the beneficial effects that: through the diagnosis model or the diagnosis system constructed by the invention, the characteristic genes and the regression coefficients thereof are obtained by screening the difference genes of the IPF and then aiming at the difference genes through a random forest classifier, and the IPF calculation is carried out through the constructed diagnosis model, so that the rapid screening of idiopathic pulmonary fibrosis is realized, the diagnosis of the IPF is carried out earlier, more accurately and more simply, and the prognosis is improved.
Drawings
FIG. 1 is a schematic diagram of an exemplary system for diagnosing idiopathic pulmonary fibrosis.
FIG. 2 is a graph of differential expression volcanic according to an embodiment of the present invention.
FIG. 3 is a thermal map of the expression level of 38 differential genes in a sample according to the embodiment of the present invention.
FIG. 4 is a scatter plot of 38 differential genes in an example of the present invention.
FIG. 5 is a graph showing the number of decision trees selected and the error rate according to an embodiment of the present invention.
FIG. 6 is a view showing the importance of genes in the examples of the present invention.
Fig. 7 is a ROC curve of a training set in an embodiment of the present invention.
Fig. 8 is a ROC curve of a validation set in an embodiment of the present invention.
FIG. 9 is a flowchart of a method for constructing a diagnostic model of idiopathic pulmonary fibrosis in an embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples.
It should be noted that these examples are only for illustrating the present invention, and not for limiting the present invention, and simple modifications of the method under the premise of the inventive concept are all within the scope of the claimed invention.
Referring to fig. 1, a diagnosis system for idiopathic pulmonary fibrosis includes a data acquisition module 100, a differential gene screening module 200, a characteristic gene screening module 300, a regression coefficient calculation module 400, a diagnosis model construction module 500, and a diagnosis module 600;
the data acquisition module 100 is used for acquiring the gene expression profile chip data of the IPF patient through the GEO database and constructing a chip data training set.
The method comprises the steps of obtaining IPF patient gene expression profile chip data through a GEO database, and constructing a chip data training set, wherein the method comprises the following steps:
Obtaining gene expression profile chip data of an IPF patient from a GEO database, wherein the gene expression profile chip data of the IPF patient are shown in table 1 and comprise GSE132607, GSE38958 and GSE28221; the gene expression profile chip data of the IPF patient are subjected to log2 conversion, probes of the IPF patient are annotated, GSE132607 and GSE38958 are combined, the batch effect of the combined data is removed by using a removeBatchEffect function in R package limma, and a chip data training set is obtained through integration, and GSE28221 is used as a verification set.
TABLE 1 GEO database IPFmRNA expression profiling chip data
Data set IPF Normal state Platform Tissue of
GSE132607 276 0 Gene Expression Array PBMC
GSE38958 70 45 Gene Expression Array PBMC
GSE28221 120 19 Gene Expression Array PBMC
The differential gene screening module 200 is used for analyzing differential expression genes of IPF and control groups by using a chip data training set through Bayesian test, screening conditions are p.adj <0.05 &| logFC | >0.5, and screening out differential genes. 38 differential genes were selected, see FIGS. 2 and 3.
The characteristic gene screening module 300 is used for screening the differential genes based on a random forest classifier.
Specifically, 38 difference genes of the chip data combined by GSE132607 and GSE38958 are put into a random forest classifier, in order to find the optimal variable number, random forest classification calculation is performed on all possible variable number cycles, the corresponding error rate is obtained, and finally 18 is selected as the optimal variable number, as shown in figure 4; and calculating the error rate of 1-2000 trees, wherein the error rate is not changed after the number of the trees reaches 1000, see fig. 5. Finally we select 18 variables, 1000 trees as the last calculated parameters. Subsequently we screened 21 signature genes with importance greater than 1, see figure 6.
The regression coefficient calculation module 400 is configured to fit a logistic regression model to the training set based on the characteristic genes, so as to obtain regression coefficients of the characteristic genes, as shown in table 2.
TABLE 2 regression coefficient table of characteristic genes
Characteristic gene Regression coefficient P value
Intercept 32.73267 0.000102
TLR10 -1.34326 0.013703
GZMK 0.30552 0.045683
CD79A -1.1445 0.063288
NOG -0.54664 0.031332
P2RY10 0.74687 0.024423
KLRB1 -1.56761 0.009573
N6AMT1 -0.83828 0.020989
EIF1AX -0.51833 0.046240
GCNT4 -0.27993 0.068536
FCRLA 1.70369 0.028434
CD40LG -2.0675 0.017964
CD69 0.97147 0.033659
ABCA13 1.54353 0.042329
RNASE3 -0.08833 0.065271
CEACAM6 0.48974 0.046443
USP9Y 0.97489 0.025125
OLFM4 -0.49965 0.038818
BPI 0.39967 0.046291
UTY 1.07367 0.014165
RPS4Y1 -1.06128 0.05182
DDX3Y 0.07629 0.047289
The diagnostic model construction module 500 is used for constructing a idiopathic pulmonary fibrosis diagnostic model according to the expression level of the characteristic genes and regression coefficients thereof.
The diagnosis module 600 is used for calculating a diagnosis score by the idiopathic pulmonary fibrosis diagnosis model based on the expression level of the characteristic gene of the subject.
The diagnostic model of idiopathic pulmonary fibrosis calculates a diagnostic score by:
1/(1+exp (-z), where ,z =[(-1.34326×TLR10)+(0.30552×GZMK)+(-1.1445×CD79A)+(-0.54664×NOG)+(0.74687×P2RY10)+(-1.56761×KLRB1)+(-0.83828×N6AMT1)+(-0.51833×EIF1AX)+(-0.27993×GCNT4)+(1.70369×FCRLA)+(-2.0675×CD40LG)+(0.97147×CD69)+(1.54353×ABCA13)+(-0.08833×RNASE3)+(0.48974×CEACAM6)+(0.97489×USP9Y)+(-0.49965×OLFM4)+(0.39967×BPI)+(1.07367×UTY)+(-1.06128×RPS4Y1)+(0.07629×DDX3Y)+32.73267]/10.
The predictive power of the ROC curve test model is used, and the AUC >0.7 is generally considered to be good for discrimination. The ROC curve of the training set is shown in fig. 7, the maximum sign index of the training set is 0.656, the area under the ROC curve AUC is 0.893 (95% ci 0.845-0.941), the optimal cut-off value is 0.875, and the sensitivity is 0.723 and the specificity is 0.933 under the cut-off value; the ROC curve part of the verification set is shown as 8, and the model has good prediction capability.
Referring to FIG. 9, a method for constructing a diagnostic model of idiopathic pulmonary fibrosis is provided, comprising
S100, acquiring gene expression profile chip data of an IPF patient through a GEO database, and constructing a chip data training set;
S200, analyzing differential expression genes of IPF and control groups by using a chip data training set through Bayesian test, wherein the screening condition is p.adj <0.05 &| logFC | >0.5, and screening out the differential genes;
s300, screening out characteristic genes from the differential genes based on a random forest classifier;
S400, fitting a logistic regression model in a training set based on the characteristic genes to obtain regression coefficients of the characteristic genes;
s500, constructing a idiopathic pulmonary fibrosis diagnosis model according to the expression quantity of the characteristic genes and the regression coefficients of the characteristic genes.
The method comprises the steps of obtaining IPF patient gene expression profile chip data through a GEO database, and constructing a chip data training set, wherein the method comprises the following steps:
Acquiring gene expression profile chip data of an IPF patient from a GEO database, wherein the gene expression profile chip data of the IPF patient comprises GSE132607, GSE38958 and GSE28221; the gene expression profile chip data of the IPF patient are subjected to log2 conversion, probes of the IPF patient are annotated, GSE132607 and GSE38958 are combined, the batch effect of the combined data is removed by using a removeBatchEffect function in R package limma, and a chip data training set is obtained through integration, and GSE28221 is used as a verification set.
Wherein, screening the differential genes out characteristic genes based on a random forest classifier comprises the following steps:
The variable of the random forest classifier is set to be 18, the number of trees for calculating error rate is 1000, and characteristic genes with importance larger than 1 are screened out, wherein the characteristic genes comprise TLR10、GZMK、CD79A、NOG、P2RY10、KLRB1、N6AMT1、EIF1AX、GCNT4、FCRLA、CD40LG、CD69、ABCA13、RNASE3、CEACAM6、USP9Y、OLFM4、BPI、UTY、RPS4Y1 and DDX3Y.
Wherein, regression coefficients of each characteristic gene: TLR10 is-1.34326, GZMK is 0.30552, CD79A is-1.1445, NOG is-0.54664, P2RY10 is 0.74687, KLRB1 is-1.56761, N6AMT1 is-0.83828, EIF1AX is-0.51833, GCNT4 is-0.27993, FCRLA is 1.70369, CD40LG is-2.0675, CD69 is 0.97147, ABCA13 is 1.54353, RNASE3 is-0.08833, CEACAM6 is 0.48974, USP9Y is 0.97489, OLFM4 is-0.49965, BPI is 0.39967, UTY is 1.07367, RPS4Y1 is-1.06128 and DDX3Y is 0.07629.
Wherein the diagnostic model of idiopathic pulmonary fibrosis calculates a diagnostic score by:
1/(1+exp (-z)), where ,z =[(-1.34326×TLR10)+(0.30552×GZMK)+(-1.1445×CD79A)+(-0.54664×NOG)+(0.74687×P2RY10)+(-1.56761×KLRB1)+(-0.83828×N6AMT1)+(-0.51833×EIF1AX)+(-0.27993×GCNT4)+(1.70369×FCRLA)+(-2.0675×CD40LG)+(0.97147×CD69)+(1.54353×ABCA13)+(-0.08833×RNASE3)+(0.48974×CEACAM6)+(0.97489×USP9Y)+(-0.49965×OLFM4)+(0.39967×BPI)+(1.07367×UTY)+(-1.06128×RPS4Y1)+(0.07629×DDX3Y)+32.73267]/10.
Finally, it is noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be understood that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (4)

1. The idiopathic pulmonary fibrosis diagnosis system is characterized by comprising a data acquisition module, a differential gene screening module, a characteristic gene screening module, a regression coefficient calculation module, a diagnosis model construction module and a diagnosis module;
The data acquisition module is used for acquiring the gene expression profile chip data of the IPF patient through the GEO database and constructing a chip data training set;
The differential gene screening module is used for analyzing differential expression genes of the IPF and control groups by using a chip data training set through Bayesian test, wherein the screening condition is p.adj <0.05 &| logFC | >0.5, and screening out differential genes;
the characteristic gene screening module is used for screening the differential genes based on a random forest classifier to obtain characteristic genes;
The regression coefficient calculation module is used for fitting a logistic regression model in a training set based on the characteristic genes to obtain regression coefficients of the characteristic genes;
the diagnosis model construction module is used for constructing an idiopathic pulmonary fibrosis diagnosis model according to the expression quantity of the characteristic genes and the regression coefficient thereof;
the diagnosis module is used for calculating a diagnosis score through the idiopathic pulmonary fibrosis diagnosis model based on the expression quantity of the characteristic genes of the person to be detected;
The method for obtaining the gene expression profile chip data of the IPF patient through the GEO database and constructing the chip data training set comprises the following steps:
acquiring gene expression profile chip data of an IPF patient from a GEO database, wherein the gene expression profile chip data of the IPF patient comprises GSE132607, GSE38958 and GSE28221; the method comprises the steps of performing log2 conversion on gene expression profile chip data of an IPF patient, annotating probes of the IPF patient, combining GSE132607 and GSE38958, removing batch effects of the combined data by using a removeBatchEffect function in R-packet limma, and integrating to obtain a chip data training set, wherein GSE28221 is used as a verification set;
Screening the differential genes based on a random forest classifier to obtain characteristic genes, wherein the screening comprises the following steps:
the variable of the random forest classifier is set to be 18, the number of trees for calculating error rate is 1000, and characteristic genes with importance larger than 1 are screened out, wherein the characteristic genes comprise TLR10、GZMK、CD79A、NOG、P2RY10、KLRB1、N6AMT1、EIF1AX、GCNT4、FCRLA、CD40LG、CD69、ABCA13、RNASE3、CEACAM6、USP9Y、OLFM4、BPI、UTY、RPS4Y1 and DDX3Y;
Regression coefficients of the characteristic genes: TLR10 is-1.34326, GZMK is 0.30552, CD79A is-1.1445, NOG is-0.54664, P2RY10 is 0.74687, KLRB1 is-1.56761, N6AMT1 is-0.83828, EIF1AX is-0.51833, GCNT4 is-0.27993, FCRLA is 1.70369, CD40LG is-2.0675, CD69 is 0.97147, ABCA13 is 1.54353, RNASE3 is-0.08833, CEACAM6 is 0.48974, USP9Y is 0.97489, OLFM4 is-0.49965, BPI is 0.39967, UTY is 1.07367, RPS4Y1 is-1.06128 and DDX3Y is 0.07629.
2. The idiopathic pulmonary fibrosis diagnostic system of claim 1, wherein the idiopathic pulmonary fibrosis diagnostic model calculates a diagnostic score by:
1/(1+exp(-z)),
wherein z= [ (-1.34326 x TLR 10) + (0.30552 x GZMK) +(-1.1445 x CD 79A) +(-0.5466)
4×NOG)+(0.74687×P2RY10)+(-1.56761×KLRB1)+(-0.83828×N6AMT1)+(-0.51833×EIF1AX)+(-0.27993×GCNT4)+(1.70369×FCRLA)+(-2.0675×CD40LG)+(0.97147×CD69)+(1.54353×ABCA13)+(-0.08833×RNASE3)+(0.48974×CEACAM6)+(0.97489×USP9Y)+(-0.49965×OLFM4)+(0.39967×BPI)+(1.07367×UTY)+(-1.06128×RPS4Y1)+(0.07629×DDX3Y)+32.73267]/10.
3. A method for constructing a diagnosis model of idiopathic pulmonary fibrosis is characterized by comprising the following steps of
Acquiring gene expression profile chip data of an IPF patient through a GEO database, and constructing a chip data training set;
Using a chip data training set to analyze differential expression genes of IPF and control groups through Bayesian test, wherein the screening condition is p.adj <0.05 +| logFC | >0.5, and screening out differential genes;
Screening out characteristic genes of the differential genes based on a random forest classifier;
fitting a logistic regression model in a training set based on the characteristic genes to obtain regression coefficients of the characteristic genes;
Constructing a idiopathic pulmonary fibrosis diagnosis model according to the expression quantity of the characteristic genes and the regression coefficient of the idiopathic pulmonary fibrosis diagnosis model;
The method for obtaining the gene expression profile chip data of the IPF patient through the GEO database and constructing the chip data training set comprises the following steps:
acquiring gene expression profile chip data of an IPF patient from a GEO database, wherein the gene expression profile chip data of the IPF patient comprises GSE132607, GSE38958 and GSE28221; the method comprises the steps of performing log2 conversion on gene expression profile chip data of an IPF patient, annotating probes of the IPF patient, combining GSE132607 and GSE38958, removing batch effects of the combined data by using a removeBatchEffect function in R-packet limma, and integrating to obtain a chip data training set, wherein GSE28221 is used as a verification set;
Screening the differential genes based on a random forest classifier to obtain characteristic genes, wherein the screening comprises the following steps:
the variable of the random forest classifier is set to be 18, the number of trees for calculating error rate is 1000, and characteristic genes with importance larger than 1 are screened out, wherein the characteristic genes comprise TLR10、GZMK、CD79A、NOG、P2RY10、KLRB1、N6AMT1、EIF1AX、GCNT4、FCRLA、CD40LG、CD69、ABCA13、RNASE3、CEACAM6、USP9Y、OLFM4、BPI、UTY、RPS4Y1 and DDX3Y;
Regression coefficients of the characteristic genes: TLR10 is-1.34326, GZMK is 0.30552, CD79A is-1.1445, NOG is-0.54664, P2RY10 is 0.74687, KLRB1 is-1.56761, N6AMT1 is-0.83828, EIF1AX is-0.51833, GCNT4 is-0.27993, FCRLA is 1.70369, CD40LG is-2.0675, CD69 is 0.97147, ABCA13 is 1.54353, RNASE3 is-0.08833, CEACAM6 is 0.48974, USP9Y is 0.97489, OLFM4 is-0.49965, BPI is 0.39967, UTY is 1.07367, RPS4Y1 is-1.06128 and DDX3Y is 0.07629.
4. The method of constructing an idiopathic pulmonary fibrosis diagnostic model of claim 3, wherein the idiopathic pulmonary fibrosis diagnostic model calculates a diagnostic score by:
1/(1+exp(-z));
wherein z= [ (-1.34326 x TLR 10) + (0.30552 x GZMK) +(-1.1445 x CD 79A) +(-0.54)
664×NOG)+(0.74687×P2RY10)+(-1.56761×KLRB1)+(-0.83828×N6AMT1)+(-0.51833×EIF1AX)+(-0.27993×GCNT4)+(1.70369×FCRLA)+(-2.0675×CD40LG)+(0.97147×CD69)+(1.54353×ABCA13)+(-0.08833×RNASE3)+(0.48974×CEACAM6)+(0.97489×USP9Y)+(-0.49965×OLFM4)+(0.39967×BPI)+(1.07367×UTY)+(-1.06128×RPS4Y1)+(0.07629×DDX3Y)+32.73267]/10.
CN202410189821.8A 2024-02-20 2024-02-20 Method for constructing idiopathic pulmonary fibrosis diagnosis model and diagnosis system Active CN117747093B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410189821.8A CN117747093B (en) 2024-02-20 2024-02-20 Method for constructing idiopathic pulmonary fibrosis diagnosis model and diagnosis system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410189821.8A CN117747093B (en) 2024-02-20 2024-02-20 Method for constructing idiopathic pulmonary fibrosis diagnosis model and diagnosis system

Publications (2)

Publication Number Publication Date
CN117747093A CN117747093A (en) 2024-03-22
CN117747093B true CN117747093B (en) 2024-06-07

Family

ID=90251206

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410189821.8A Active CN117747093B (en) 2024-02-20 2024-02-20 Method for constructing idiopathic pulmonary fibrosis diagnosis model and diagnosis system

Country Status (1)

Country Link
CN (1) CN117747093B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014144564A2 (en) * 2013-03-15 2014-09-18 Veracyte, Inc. Biomarkers for diagnosis of lung diseases and methods of use thereof
CN107099581A (en) * 2012-03-27 2017-08-29 弗·哈夫曼-拉罗切有限公司 The method of prediction, diagnosis and treatment idiopathic pulmonary fibrosis
CN114283885A (en) * 2021-12-25 2022-04-05 重庆医科大学 Method for constructing diagnosis model of prostate cancer
CN114864003A (en) * 2022-03-17 2022-08-05 中国科学院深圳先进技术研究院 Differential analysis method and system based on single cell samples of mixed experimental group and control group
CN115261454A (en) * 2022-04-20 2022-11-01 合肥市传染病医院(合肥市第六人民医院) Novel let-7d-5p and miR-140-5p biomarker panel diagnosis method
CN117497062A (en) * 2023-11-15 2024-02-02 广州瑞能精准医学科技有限公司 Method for constructing idiopathic pulmonary fibrosis plasma cell characteristic gene prognosis model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107099581A (en) * 2012-03-27 2017-08-29 弗·哈夫曼-拉罗切有限公司 The method of prediction, diagnosis and treatment idiopathic pulmonary fibrosis
WO2014144564A2 (en) * 2013-03-15 2014-09-18 Veracyte, Inc. Biomarkers for diagnosis of lung diseases and methods of use thereof
CN114283885A (en) * 2021-12-25 2022-04-05 重庆医科大学 Method for constructing diagnosis model of prostate cancer
CN114864003A (en) * 2022-03-17 2022-08-05 中国科学院深圳先进技术研究院 Differential analysis method and system based on single cell samples of mixed experimental group and control group
CN115261454A (en) * 2022-04-20 2022-11-01 合肥市传染病医院(合肥市第六人民医院) Novel let-7d-5p and miR-140-5p biomarker panel diagnosis method
CN117497062A (en) * 2023-11-15 2024-02-02 广州瑞能精准医学科技有限公司 Method for constructing idiopathic pulmonary fibrosis plasma cell characteristic gene prognosis model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"特发性肺纤维化的基因学筛查研究及其急性加重机制初探";范珊珊;《中国优秀硕士学位论文全文数据库(医药卫生科技辑)》;20210815(第08期);第E063-13页 *
特发性肺纤维化相关基因的筛选和生物信息学分析;邢静;黄鑫炎;郭禹标;;中山大学学报(医学科学版);20171115(第06期);第131-135+142页 *

Also Published As

Publication number Publication date
CN117747093A (en) 2024-03-22

Similar Documents

Publication Publication Date Title
CN112951406A (en) Lung cancer prognosis auxiliary evaluation method and system based on CT (computed tomography) image omics
CN112635056A (en) Lasso-based esophageal squamous carcinoma patient risk prediction nomogram model establishing method
CN112641451B (en) Multi-scale residual error network sleep staging method and system based on single-channel electroencephalogram signal
CN113989833B (en) EFFICIENTNET network-based oral mucosa disease identification method
CN112950614A (en) Breast cancer detection method based on multi-scale cavity convolution
CN110991536A (en) Training method of early warning model of primary liver cancer
CN111833330B (en) Lung cancer intelligent detection method and system based on fusion of image and machine smell
CN113593708A (en) Sepsis prognosis prediction method based on integrated learning algorithm
CN117672522A (en) A method for survival prediction of osteosarcoma based on machine learning model
CN114121286A (en) Exhaled breath detection-based disease risk assessment method and device and related products
CN118799648A (en) A deep learning medical image processing classification system combined with multimodal big data
CN117747093B (en) Method for constructing idiopathic pulmonary fibrosis diagnosis model and diagnosis system
CN111748634A (en) A combination of characteristic lincRNA expression profiles and an early prediction method for colon cancer
CN115691813A (en) Genetic gastric cancer assessment method and system based on genomics and microbiomics
CN118538416A (en) Method for predicting colorectal cancer distant metastasis state
CN112690815A (en) System and method for assisting in diagnosing lesion grade based on lung image report
CN118039116A (en) Method and system for constructing gestational diabetes judgment model based on machine learning
CN116628557A (en) Method and device for assessing organic bowel disease type based on belief rule base reasoning
CN113989543B (en) A COVID-19 medical image detection and classification method and device
CN115810122A (en) SPECT/CT-based deep learning method for detecting activity of thyroid-related ophthalmopathy
CN110530816B (en) Method for early diagnosis of rice blast by using infrared photoacoustic spectrum
CN113345525A (en) Analysis method for reducing influence of covariates on detection result in high-throughput detection
CN119991669B (en) Deep learning-based children pneumonia lung image analysis and evaluation system and method
CN112489038A (en) Fuzzy model breast cancer diagnosis method based on fuzzy clustering and generalized least square method
CN114878832B (en) Idiopathic pulmonary fibrosis plasma protein markers and their use in preparing detection reagents or diagnostic tools

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant