[go: up one dir, main page]

CN114283890A - A disease risk prediction method and device based on Ruminococcus microbiota - Google Patents

A disease risk prediction method and device based on Ruminococcus microbiota Download PDF

Info

Publication number
CN114283890A
CN114283890A CN202111533823.7A CN202111533823A CN114283890A CN 114283890 A CN114283890 A CN 114283890A CN 202111533823 A CN202111533823 A CN 202111533823A CN 114283890 A CN114283890 A CN 114283890A
Authority
CN
China
Prior art keywords
disease
risk prediction
ruminococcus
disease risk
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111533823.7A
Other languages
Chinese (zh)
Other versions
CN114283890B (en
Inventor
徐婷
季明辉
许勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Medical University
Original Assignee
Nanjing Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Medical University filed Critical Nanjing Medical University
Priority to CN202111533823.7A priority Critical patent/CN114283890B/en
Publication of CN114283890A publication Critical patent/CN114283890A/en
Application granted granted Critical
Publication of CN114283890B publication Critical patent/CN114283890B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Investigating Or Analysing Biological Materials (AREA)

Abstract

本发明涉及一种基于瘤胃球菌微生物群的疾病风险预测方法及装置,获取肠道菌群相对丰度信息;确定样本分类变量数据;将肠道菌群特征数据和分类变量输入机器学习模型中进行训练,得到疾病风险预测模型;利用网格搜索算法进行参数调整;进行测试;进行性能评价;利用性能评价合格的疾病风险预测模型进行疾病风险预测。其优点在于,通过瘤胃球菌微生物群及样本相关信息可以对多种疾病风险进行预测,获取简单,不会对被检测者造成创伤;利用随机森林模型从复杂繁多的生物大数据中筛选用于预测多种疾病风险的、无创的生物标志物,提高预测准确率,弥补了不同疾病临床预警的空白;预测方法简单快捷,效率高,可以快速指引或辅助被检测人群进行后续处理。

Figure 202111533823

The invention relates to a disease risk prediction method and device based on Ruminococcus microbiota, which can obtain relative abundance information of intestinal flora; determine sample classification variable data; input intestinal flora characteristic data and classification variables into a machine learning model to carry out After training, the disease risk prediction model is obtained; the grid search algorithm is used to adjust the parameters; the test is performed; the performance evaluation is performed; The advantage is that the risk of a variety of diseases can be predicted through the Ruminococcus microbiome and sample-related information, which is easy to obtain and will not cause trauma to the tested person; the random forest model is used to screen from complex biological big data for prediction. The non-invasive biomarkers of various disease risks can improve the prediction accuracy and make up for the blank of clinical early warning of different diseases; the prediction method is simple, fast and efficient, and can quickly guide or assist the tested population for follow-up processing.

Figure 202111533823

Description

Disease risk prediction method and device based on rumen coccus microbiota
Technical Field
The invention relates to the technical field of biomedicine, in particular to a disease risk prediction method and device based on a rumen coccus microbiota, computer equipment and a computer readable storage medium.
Background
For prediction and diagnosis of diseases, detection is generally performed by invasive detection methods such as blood detection and biopsy, and the detection time is long, and the detection causes wounds to patients and healthy people. In addition, when the patient takes the detection document, the doctor is required to judge, so that the whole detection period is long. In addition, for the same index, due to different experience of doctors, various prediction and diagnosis results exist, and the confirmation needs to be repeated for many times, so that the efficiency and the accuracy are low.
Further, the diagnosis obtained by patients and healthy people is generally already sick, mostly in the middle and advanced stages, i.e. only treatment can be performed. Thus, existing invasive tests do not allow patients to prevent disease.
At present, no effective solution is provided for the problems of low invasive detection efficiency, low accuracy and wound pain of a detected person in the related technology.
Disclosure of Invention
The present application aims to overcome the defects in the prior art, and provides a disease risk prediction method, device, computer equipment and computer-readable storage medium based on ruminococcus microbiota to at least solve the problems of low invasive detection efficiency, low accuracy and traumatic pain to a detected person in the related art, and in order to achieve the above object, the present application adopts the following technical scheme:
in a first aspect, the present invention provides a method for predicting risk of disease based on the microbiota of ruminococcus comprising:
acquiring relative abundance information of the metagenome data of the excrement samples of the disease population and the health population;
determining feature data of the intestinal flora according to the relative abundance information and pre-screened biomarkers of the diseases, wherein the biomarkers of the diseases are pre-screened according to literature review and historical information of relative abundance of differential bacteria, and the historical information of relative abundance of the differential bacteria is obtained by performing difference analysis on the historical information of relative abundance of disease people and healthy people;
determining classification variables of disease people and healthy people;
inputting the characteristic data of the intestinal flora and the classification variables into a pre-established machine learning model for training to obtain a disease risk prediction model;
predicting the disease risk by using a disease risk prediction model;
wherein the disease includes inflammation, atherosclerosis, tumors, hypertension, diabetes, infection;
wherein the biomarker is a ruminococcus microbiota;
wherein the categorical variables include gender, age, antibiotic usage, smoking status, smoking history, country;
the machine learning model comprises a random forest model, a decision tree model and an Adaboost model.
In some of these embodiments, the machine learning model is a random forest model.
In some of these embodiments, the inflammation comprises bronchitis, cystitis, otitis, pneumonia.
In some embodiments, obtaining relative abundance information of stool sample metagenomic data for a disease population and a healthy population comprises:
acquiring the metagenome data of the excrement samples of disease people and health people;
and performing species annotation analysis and function annotation analysis on the stool sample metagenome data to obtain the relative abundance information of disease people and healthy people.
In some of these embodiments, obtaining stool sample metagenomic data for the diseased and healthy population comprises:
obtaining classification and flora metagenomic information of a human microbiome sample in metamicdata repository data from an ExperimentHub R library by using a curatedMetagenomicData package;
screening and downloading sample metagenome data and sample general information from excrement, wherein the sample metagenome data comprises a flora classification spectrum and flora relative abundance, and the sample general information comprises an experimental scheme, a disease state, age, gender, antibiotic use condition, region (or country), smoking condition and smoking history.
In some of these embodiments, performing species annotation analysis and functional annotation analysis on the stool sample metagenomic data comprises:
and carrying out standardized naming on the rumen coccus microbiota in the fecal sample metagenome data according to the classification method of the national center for biotechnology information.
In some embodiments, after the standardized naming is performed, the method further comprises:
the abundance of the ruminococcus microbiota from different studies was pooled.
In some of these embodiments, further comprising:
the disease categories were standardized for naming and merging using the medical topic.
In some embodiments, before inputting the characteristic data of the intestinal flora into a pre-established machine learning model for training to obtain a disease risk prediction model, the method further includes:
screening the characteristic data of the intestinal flora to obtain sample data with the abundance of all rumen coccus microbiota being 0;
deleting sample data of which the abundances of all rumen coccus microbiota are 0 from the characteristic data of the intestinal flora;
and storing the characteristic data of the intestinal flora after the data deletion.
In some embodiments, before inputting the characteristic data of the intestinal flora into a pre-established machine learning model for training to obtain a disease risk prediction model, the method further includes:
performing dummy variable transformation on the classification variables by using IBM SPSS statistics 23.0;
and storing the classification variables after the dummy variable transformation.
In some of these embodiments, predicting a disease risk using a disease risk prediction model comprises:
adjusting parameters of the disease risk prediction model by using a grid search algorithm;
testing the disease risk prediction model after parameter adjustment by using the test data;
according to the test result, performing performance evaluation on the disease risk prediction model by using a confusion matrix;
and (5) performing disease risk prediction by using a disease risk prediction model qualified in performance evaluation.
In some of these embodiments, the number of gender variations is 3, NA, male, female, respectively.
In some of these embodiments, the number of age variations is 4, newborn, child, adult, elderly, respectively.
In some of these examples, the antibiotic usage variables are 3, NA, used, not used.
In some of these embodiments, the smoking status variable is 3, NA, present, or absent.
In some of these embodiments, the number of variables in the smoking history is 3, NA, present, or absent, respectively.
In some of these embodiments, the number of variables for the country is 9, NA, canada, china, the netherlands, israel, finland, russia, sweden, usa respectively.
In some of these embodiments, the IBM SPSS statistics 23.0 does not make dummy variable changes to predictor variables, wherein the predictor variables are disease categories.
In some of these embodiments, the number of variables for the disease category is 8, healthy, inflammatory, atherosclerotic, neoplastic, hypertensive, diabetic, infectious, or others.
In some of these embodiments, the pre-screened biomarkers of disease comprise:
Ruminococcus_gnavus;
Ruminococcus_obeum;
Ruminococcus_torques;
Ruminococcus_albus;
Ruminococcus_bromii;
Ruminococcus_callidus;
Ruminococcus_champanellensis;
Ruminococcus_flavefaciens;
Ruminococcus_lactaris;
Ruminococcaceae_Faecalibacterium_prausnitzii;
Ruminococcaceae_bacterium_D16;
Ruminococcaceae;
Ruminococcus。
in a second aspect, the present invention provides a ruminococcus microbiota-based disease risk prediction device comprising:
the data acquisition module is used for acquiring the relative abundance information of the metagenome data of the stool samples of the disease population and the health population;
the characteristic data determining module is used for determining the characteristic data of the intestinal flora according to the relative abundance information and the pre-screened biomarkers of the diseases, wherein the biomarkers of the diseases are pre-screened according to literature review and the historical information of the relative abundance of the differential bacteria, and the historical information of the relative abundance of the differential bacteria is obtained by performing difference analysis on the historical information of the relative abundance of the disease population and the historical information of the relative abundance of the healthy population;
the component variable determining module is used for determining classification variables of disease people and healthy people;
the model training module is used for inputting the intestinal flora characteristic data and the component variables into a pre-established machine learning model for training to obtain a disease risk prediction model;
the risk prediction module is used for predicting the disease risk by utilizing the disease risk prediction model;
wherein the disease includes inflammation, atherosclerosis, tumors, hypertension, diabetes, infection;
wherein the biomarker is a ruminococcus microbiota;
wherein the categorical variables include gender, age, antibiotic usage, smoking status, smoking history, country;
the machine learning model comprises a random forest model, a decision tree model and an Adaboost model.
In some of these embodiments, further comprising:
the parameter adjusting module is used for adjusting parameters of the disease risk prediction model by utilizing a grid search algorithm;
the test module is used for testing the disease risk prediction model after parameter adjustment by using test data;
the performance evaluation module is used for carrying out performance evaluation on the disease risk prediction model by using the confusion matrix according to the test result;
and the risk prediction module is also used for predicting the disease risk by using the disease risk prediction model qualified by the performance evaluation.
In some of these embodiments, further comprising:
and the data cleaning module is used for screening the characteristic data of the intestinal flora to obtain sample data with the abundance of all the rumen coccus microbiota being 0, deleting the sample data with the abundance of all the rumen coccus microbiota being 0 from the characteristic data of the intestinal flora, and storing the characteristic data of the intestinal flora after data deletion.
In some of these embodiments, further comprising:
and the variable transformation module is used for carrying out dummy variable transformation on the classification variable by using IBM SPSS statistics 23.0 and storing the classification variable subjected to the dummy variable transformation.
In some of these embodiments, the data acquisition module comprises:
the metagenome data sub-acquisition module is used for acquiring the metagenome data of the excrement samples of the disease population and the health population;
and the annotation analysis submodule is used for performing species annotation analysis and function annotation analysis on the stool sample metagenome data to obtain the relative abundance information of disease people and healthy people.
In some of these embodiments, the pre-screened biomarkers of disease comprise:
Ruminococcus_gnavus;
Ruminococcus_obeum;
Ruminococcus_torques;
Ruminococcus_albus;
Ruminococcus_bromii;
Ruminococcus_callidus;
Ruminococcus_champanellensis;
Ruminococcus_flavefaciens;
Ruminococcus_lactaris;
Ruminococcaceae_Faecalibacterium_prausnitzii;
Ruminococcaceae_bacterium_D16;
Ruminococcaceae;
Ruminococcus。
in a third aspect, the present invention provides a use of a ruminococcus microbiota for the prediction of risk of disease.
In some embodiments thereof, the ruminococcus microbiota comprises:
Ruminococcus_gnavus;
Ruminococcus_obeum;
Ruminococcus_torques;
Ruminococcus_albus;
Ruminococcus_bromii;
Ruminococcus_callidus;
Ruminococcus_champanellensis;
Ruminococcus_flavefaciens;
Ruminococcus_lactaris;
Ruminococcaceae_Faecalibacterium_prausnitzii;
Ruminococcaceae_bacterium_D16;
Ruminococcaceae;
Ruminococcus。
in some embodiments, the disease comprises inflammation, atherosclerosis, tumor, hypertension, diabetes, infection.
In a fourth aspect, the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the disease risk prediction method as described above when executing the computer program.
In a fifth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a disease risk prediction method as described above.
Compared with the related art, the disease risk prediction method, the device, the computer equipment and the computer storage medium based on the rumen coccus microbiota provided by the embodiment of the application can predict various disease risks through the rumen coccus microbiota, the sample acquisition mode is simple, the detected person does not have wound during non-invasive detection, and the detected person does not have wound; a random forest model is utilized to screen noninvasive biomarkers for predicting risks of various diseases from complex and various biological big data, so that the prediction accuracy is improved, and the blank of clinical early warning of different diseases is filled; the prediction method is simple and quick, has high efficiency, and can quickly guide or assist the detected people to carry out subsequent processing.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a flow chart (one) of a disease risk prediction method according to an embodiment of the present application;
fig. 2 is a flow chart of a disease risk prediction method according to an embodiment of the present application (two);
fig. 3 is a flow chart of a disease risk prediction method according to an embodiment of the present application (iii);
fig. 4 is a flow chart (iv) of a disease risk prediction method according to an embodiment of the present application;
fig. 5 is a block diagram of a disease risk prediction device according to an embodiment of the present application;
FIG. 6 is a schematic representation of feature importance of variables according to an embodiment of the present application;
FIG. 7 is a flow diagram of a grid search according to an embodiment of the present application;
FIG. 8 is a diagram illustrating grid search results according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a confusion matrix according to an embodiment of the application;
fig. 10 is a schematic diagram of mapping gray scales of a confusion matrix according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or elements (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
Example 1
The invention provides an application of a rumen coccus microbiota in disease risk prediction.
In some embodiments thereof, the ruminococcus microbiota comprises:
Ruminococcus_gnavus;
Ruminococcus_obeum;
Ruminococcus_torques;
Ruminococcus_albus;
Ruminococcus_bromii;
Ruminococcus_callidus;
Ruminococcus_champanellensis;
Ruminococcus_flavefaciens;
Ruminococcus_lactaris;
Ruminococcaceae_Faecalibacterium_prausnitzii;
Ruminococcaceae_bacterium_D16;
Ruminococcaceae;
Ruminococcus。
in some embodiments, the disease comprises inflammation, atherosclerosis, tumor, hypertension, diabetes, infection.
Fig. 1 is a flowchart (one) of a disease risk prediction method according to an embodiment of the present invention. As shown in fig. 1, a method for predicting risk of disease based on ruminococcus microbiota, comprising:
s102, obtaining relative abundance information of the metagenome data of the stool samples of the disease population and the health population;
step S104, determining feature data of the intestinal flora according to the relative abundance information and pre-screened biomarkers of the diseases, wherein the biomarkers of the diseases are pre-screened according to literature review and the historical information of the relative abundance of the differential bacteria, and the historical information of the relative abundance of the differential bacteria is obtained by performing difference analysis on the historical information of the relative abundance of the disease population and the historical information of the relative abundance of the health population;
s106, determining classification variables of disease people and health people;
step S108, inputting the characteristic data and the classification variables of the intestinal flora into a pre-established machine learning model for training to obtain a disease risk prediction model;
step S110, predicting the disease risk by using a disease risk prediction model;
wherein the disease includes inflammation, atherosclerosis, tumor, hypertension, diabetes, infection;
wherein the intestinal flora is rumen coccus flora;
wherein the classification variables comprise sex, age, antibiotic usage, smoking status, smoking history, country;
the machine learning model comprises a random forest model, a decision tree model and an Adaboost model.
Wherein the pre-screened biomarkers of the disease comprise:
Ruminococcus_gnavus;
Ruminococcus_obeum;
Ruminococcus_torques;
Ruminococcus_albus;
Ruminococcus_bromii;
Ruminococcus_callidus;
Ruminococcus_champanellensis;
Ruminococcus_flavefaciens;
Ruminococcus_lactaris;
Ruminococcaceae_Faecalibacterium_prausnitzii;
Ruminococcaceae_bacterium_D16;
Ruminococcaceae;
Ruminococcus。
in some embodiments, obtaining relative abundance information of stool sample metagenomic data for a disease population and a healthy population comprises:
acquiring the metagenome data of the excrement samples of disease people and health people;
and performing species annotation analysis and function annotation analysis on the stool sample metagenome data to obtain the relative abundance information of disease people and healthy people.
In some of these embodiments, obtaining stool sample metagenomic data for the diseased and healthy population comprises:
obtaining classification and flora metagenomic information of a human microbiome sample in metamicdata repository data from an ExperimentHub R library by using a curatedMetagenomicData package;
screening and downloading sample metagenome data and sample general information from the excrement, wherein the sample metagenome data comprises a flora classification spectrum and flora relative abundance, and the sample general information comprises an experimental scheme, a disease state, age, gender, antibiotic use condition, region (or country), smoking condition and smoking history.
In some of these embodiments, performing species annotation analysis and functional annotation analysis on the stool sample metagenomic data comprises:
carrying out standardized naming on the fecal sample metagenome data according to the classification method of the national center for biotechnology information and the rumen coccus microbiota;
the abundance of the ruminococcus microbiota from different studies was pooled.
In some of these embodiments, further comprising:
the disease categories were standardized for naming and merging using the medical topic.
Through the steps, the risk of various diseases can be predicted through the rumen coccus microbiota, the sample acquisition mode is simple, the detected person does not have wound during detection, and the detected person does not have wound; a random forest model is utilized to screen noninvasive biomarkers for predicting risks of various diseases from complex and various biological big data, so that the prediction accuracy is improved, and the blank of clinical early warning of different diseases is filled; the prediction method is simple and quick, has high efficiency, and can quickly guide or assist the detected people to carry out subsequent processing.
Fig. 2 is a flowchart of a disease risk prediction method according to an embodiment of the present invention (ii). As shown in fig. 2, before inputting the characteristic data of the intestinal flora into a pre-established machine learning model for training to obtain a disease risk prediction model, the method further includes:
s202, screening characteristic data of the intestinal flora to obtain sample data of which the abundances of all rumen coccus microbiota are 0;
step S204, sample data of which the abundances of all rumen coccus microorganisms are 0 are deleted from the characteristic data of the intestinal flora;
and S206, storing the characteristic data of the intestinal flora subjected to data deletion.
Specifically, considering that the 0 value of the flora abundance may have both systematic errors and real situations, the samples with the abundance of all the rumen coccus flora of the same strain being 0 are deleted.
Through the steps, the intestinal flora characteristic data is subjected to data cleaning, so that the problems of data redundancy, data missing values and abnormal values are solved.
Fig. 3 is a flowchart of a disease risk prediction method according to an embodiment of the present invention (iii). As shown in fig. 3, before inputting the characteristic data of the intestinal flora into a pre-established machine learning model for training to obtain a disease risk prediction model, the method further includes:
step S302, carrying out dummy variable transformation on the classification variables by using IBM SPSS statistics 23.0;
and step S304, storing the classification variables after the dummy variable transformation.
In some of these embodiments, the IBM SPSS statistics 23.0 does not make dummy variable changes to the predicted variables, where the predicted variables are disease classes.
In some of these embodiments, the number of gender variations is 3, NA, male, female, respectively.
In some of these embodiments, the number of age variations is 4, newborn, child, adult, elderly, respectively.
In some of these examples, the number of antibiotic usage variables was 3, NA, used, not used.
In some of these embodiments, the number of smoking event variables is 3, NA, present, or absent.
In some of these embodiments, the number of variables for the smoking history is 3, NA, present, or absent, respectively.
In some of these embodiments, the number of national variables is 9, NA, canada, china, the netherlands, israel, finland, russia, sweden, usa respectively.
In some of these embodiments, the number of variables for a disease category is 8, healthy, inflammatory, atherosclerotic, tumor, hypertension, diabetes, infection, among others.
Through the steps, since partial fields in the data are unordered second-class variables and multi-class variables, and influence of assigned values on the model is not eliminated, the IBM SPSS statistics 23.0 is used for carrying out dummy variable transformation on the class variables except the predicted variables so as to improve the goodness of fit of the model.
Fig. 4 is a flowchart of a disease risk prediction method according to an embodiment of the present invention (four). As shown in fig. 4, the disease risk prediction using the disease risk prediction model includes:
s402, adjusting parameters of a disease risk prediction model by using a grid search algorithm;
step S404, testing the disease risk prediction model after parameter adjustment by using the test data;
step S406, according to the test result, performing performance evaluation on the disease risk prediction model by using a confusion matrix;
and step S408, predicting the disease risk by using the disease risk prediction model qualified in performance evaluation.
Through the steps, the disease risk prediction model is optimized, and the prediction accuracy is improved.
Fig. 5 is a block diagram of a disease risk prediction apparatus according to an embodiment of the present invention. As shown in fig. 4, a ruminococcus microbiota-based disease risk prediction apparatus 500 includes:
the data acquisition module 501 is used for acquiring the relative abundance information of the metagenome data of the stool samples of the disease population and the healthy population;
a characteristic data determination module 502, configured to determine characteristic data of the intestinal flora according to the relative abundance information and pre-screened biomarkers of the disease, where the biomarkers of the disease are pre-screened according to review of literature and historical information of relative abundance of differential bacteria, and the historical information of relative abundance of differential bacteria is obtained by performing difference analysis on the historical information of relative abundance of the disease population and the historical information of relative abundance of the healthy population;
a component variable determination module 503 for determining classification variables of the disease population and the healthy population;
the model training module 504 is used for inputting the characteristic data of the intestinal flora into a pre-established machine learning model for training to obtain a disease risk prediction model;
a risk prediction module 508 for predicting a disease risk using the disease risk prediction model;
wherein the disease includes inflammation, atherosclerosis, tumor, hypertension, diabetes, infection;
wherein the intestinal flora is rumen coccus flora;
wherein the classification variables comprise sex, age, antibiotic usage, smoking status, smoking history, country;
the machine learning model comprises a random forest model, a decision tree model and an Adaboost model.
In some of these embodiments, the data acquisition module 501 includes:
the metagenome data sub-acquisition module is used for acquiring the metagenome data of the excrement samples of the disease population and the health population;
and the annotation analysis submodule is used for performing species annotation analysis and function annotation analysis on the stool sample metagenome data to obtain the relative abundance information of the disease population and the healthy population.
In some of these embodiments, the disease risk prediction device 500 further comprises:
in some examples thereof, the disease risk prediction apparatus 500 further comprises:
a parameter adjusting module 505, configured to perform parameter adjustment on the disease risk prediction model by using a grid search algorithm;
a testing module 506, configured to test the disease risk prediction model after parameter adjustment by using the test data;
the performance evaluation module 507 is used for evaluating the performance of the disease risk prediction model by using the confusion matrix according to the test result;
the risk prediction module 508 is further configured to perform disease risk prediction using the disease risk prediction model qualified by performance evaluation.
The data cleaning module 509 is configured to screen the characteristic data of the intestinal flora to obtain sample data with an abundance of all ruminococcus microbiota of 0, delete the sample data with an abundance of all ruminococcus microbiota of 0 from the characteristic data of the intestinal flora, and store the characteristic data of the intestinal flora after data deletion.
In some of these embodiments, the disease risk prediction device 500 further comprises:
and the variable transformation module 510 is configured to perform dummy variable transformation on classification variables in the intestinal flora feature data by using IBM SPSS statistics 23.0, and store the intestinal flora feature data after the dummy variable transformation, wherein the classification variables include gender, age, antibiotic usage, smoking condition, smoking history, and country.
In addition, the disease risk prediction method of the embodiment of the present application may be implemented by a computer device. Components of the computer device may include, but are not limited to, a processor and a memory storing computer program instructions.
In some embodiments, the processor may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of embodiments of the present Application.
In some embodiments, the memory may include mass storage for data or instructions. By way of example, and not limitation, memory may include a hard disk Drive (hard disk Drive, abbreviated HDD), a floppy disk Drive, a Solid State Drive (SSD), flash memory, an optical disk, a magneto-optical disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. The memory may include removable or non-removable (or fixed) media, where appropriate. The memory may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory is a Non-Volatile (Non-Volatile) memory. In particular embodiments, the Memory includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.
The memory may be used to store or cache various data files for processing and/or communication use, as well as possibly computer program instructions for execution by the processor.
The processor reads and executes the computer program instructions stored in the memory to implement any one of the disease risk prediction methods in the above embodiments.
In some of these embodiments, the computer device may also include a communication interface and a bus. The processor, the memory and the communication interface are connected through a bus and complete mutual communication.
The communication interface is used for realizing communication among units, devices, units and/or equipment in the embodiment of the application. The communication interface may also be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.
A bus comprises hardware, software, or both that couple components of a computer device to one another. Buses include, but are not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, a Bus may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, abbreviated VLB) bus or other suitable bus or a combination of two or more of these. A bus may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The computer device may perform the disease risk prediction method in the embodiments of the present application.
In addition, in combination with the disease risk prediction method in the above embodiments, the embodiments of the present application may be implemented by providing a computer-readable storage medium. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the disease risk prediction methods of the above embodiments.
Example 2
This embodiment is a specific application example of the present invention.
A method for predicting risk of disease based on ruminococcus microbiota comprising:
step S501, acquiring classification and flora metagenome information of human microbiome samples from MetagenomicData storage database data in an Experimenthub R library by using a curatedMetagenomicData package.
Among these, the samples were 10199 samples of more than 30 types of health and disease in 52 different studies.
Step S502, screening and downloading sample metagenome data (flora classification spectrum, flora relative abundance) and sample general information (experimental scheme, disease state, age, sex, antibiotic use condition, region (or country), smoking condition and the like) from the excrement.
The number of samples is 8799.
Step S503, determining characteristic data of the intestinal flora according to the relative abundance information and pre-selected biomarkers closely related to diseases and disease states, wherein the biomarkers of the diseases are pre-screened according to literature review and the historical information of the relative abundance of the differential bacteria, and the historical information of the relative abundance of the differential bacteria is obtained by performing difference analysis on the historical information of the relative abundance of the diseases and the historical information of the relative abundance of healthy people.
Step S504, standardized naming of ruminococcus microbiota by reference to the National Center for Biotechnology Information (NCBI) taxonomy and wikipedia, and merging ruminococcus microbiota abundances from different studies (see table 1). The medical topic the disease category (MeSH) was used to standardize naming and merging of diseases.
TABLE 1 standardized nomenclature of rumen coccus microbiota
Figure BDA0003411845020000141
Figure BDA0003411845020000151
TABLE 2 disease Classification Table
Serial number Index (I) Encoding
1 Health care 1
2 Inflammation (bronchitis, cystitis, otitis, pneumonia) 2
3 Atherosclerosis of arteries 3
4 Tumor(s) 4
5 Hypertension (hypertension) 5
6 Diabetes mellitus 6
7 Infection with viral infection 7
8 Others 0
And step S505, performing data cleaning for solving the problems of data redundancy, data missing values and abnormal values. And (3) deleting the samples with the abundances of all the rumen coccus microbiota of the same strain being 0, considering that the 0 value of the abundance of the flora can have both systematic errors and real situations.
Step S506, considering that some fields are disorder secondary classification and multi-classification variables, and influence of assigned values on the model is not excluded, the IBM SPSS statistics 23.0 is used to transform the classification variables such as gender, antibiotic usage, disease classification, smoking status, country, etc. except the predicted variable (disease category) into dummy variables (dummy variable), so as to improve the model fitting goodness (as shown in table 3).
Table 3 dummy variable handling table
Figure BDA0003411845020000152
Step S507, training the processed intestinal flora feature data (13 ruminococcus microbiota and 6 classification indexes (25 variables)) in a training set to obtain a disease type prediction model, and performing cross validation by 10 to calculate prediction accuracy of different models, where the training models are a Random Forest model (Random Forest), a Decision Tree model (Decision Tree), and an Adaboost model (Adaboost), respectively, the accuracy is shown in table 4, and the importance of each variable is shown in fig. 6.
In the invention, a RandomForestClassifier packet in a sklern. ensemble is used for model training, a decision tree packet of a sklern. tree is used for decision tree analysis, and an AdaBoostClassifier packet in the sklern. ensemble is used for Adaboost analysis.
TABLE 4 accuracy
Figure BDA0003411845020000161
And S508, selecting a random forest model, and performing parameter optimization on the machine learning model by using a grid search algorithm.
And adjusting parameters by using a grid search algorithm, namely sequentially adjusting the parameters according to the step length in a specified parameter range, training a learner by using the adjusted parameters, and finding the parameter with the highest precision on the verification set from all the parameters.
Specifically, as shown in fig. 7, the grid search flow is as follows:
determining parameters estimators, wherein the range is 0-200, and the original step length is 10;
calculating the accuracy of the model corresponding to the parameters by using a cross validation method;
judging whether the searching is finished or not, and returning to the previous step under the condition that the searching is not finished; under the condition that the search is finished, executing the next step;
and outputting the optimal parameters.
The random forest model comprises three frame parameters of n _ estimators, oob _ score and oob _ score, wherein the n _ estimators refer to the number of decision trees with the largest RF and are mainly concerned parameters. We evaluated the model score using "cross _ val _ score" of "sklern. model _ selection" in python, and used a grid search for model parameter adjustment, which results in a model score of 0.853, which is the best model, when n _ estimators is 121, as shown in table 5 and fig. 8.
TABLE 5 grid search results
Figure BDA0003411845020000162
Figure BDA0003411845020000171
Step S509, the machine learning model after parameter adjustment is tested by using the test and external verification data.
And step S510, evaluating the performance of the machine learning model by using the confusion matrix according to the external verification result.
Specifically, the classification model is evaluated using a confusion matrix in sklern, and the model is evaluated by formula calculation Accuracy (Accuracy), Precision (Precision), Recall (Recall), and F1 value (F1 score).
As shown in table 7 and fig. 9, in this matrix (8x8), the rows represent the true values of the samples and the columns represent the predicted values of the samples predicted by the algorithm, so that the position of the ith row and jth column represents the number of samples whose true values are i and whose predicted values are j. The prediction error of the classification algorithm in the model is relatively less, the model performance is good, therefore, the number of samples with the real sample value i and the predicted value i is more, (the position of the ith row and the ith column is the diagonal line in the confusion matrix).
Table 7 schematic representation of confusion matrix
T00 F01 F02 F03 F04 F05 F06 F07
F10 T11 F12 F13 F14 F15 F16 F17
F20 F21 T22 F23 F24 F25 F26 F27
F30 F31 F32 T33 F34 F35 F36 F37
F40 F41 F42 F43 T44 F45 F46 F47
F50 F51 F52 F53 F54 T55 F56 F57
F60 F61 F62 F63 F64 F65 T66 F67
F70 F71 F72 F73 F74 F75 F76 T77
In table 7, T represents true, F represents false, the first number represents the true result, i.e., the predicted value, and the second number represents the predicted class, i.e., the label value.
The specific calculation method of the Accuracy (Accuracy) is as follows:
accuracy (Accuracy): the number of correctly classified samples accounts for the total number of samples.
A=(T00+T11+…+T77)/N=(121+118+111+94+125+76+121+131)/987=0.909。
The Precision (Precision) is calculated by the following specific method:
the precision ratio is as follows: the correct-predicted proper data accounts for the proportion of the correct-predicted proper data.
P0=T00/(T00+F01+F02+…+F07)=121/(121+8+2+3+0+1+0)=0.896;
The same principle is calculated as follows:
P1=0.983;P2=0.917;P3=0.879;P4=0.839;P5=0.974;P6=0.931;P7=0.891;
P=(P0+P1+…+P7)/8=0.914。
recall (Recall) specific calculation method:
the recall ratio is as follows: the positive data that is predicted to be correct is proportional to the actual positive data.
R0=T00/(T00+F10+F20+…+F70)=121/135=0.896;
The same principle is calculated as follows:
R1=0.648;R2=0.974;R3=0.989;R4=0.969;R5=1.000;R6=0.992;R7=0.978
R=(R0+R1+…+R7)/8=0.931。
specific calculation method of F1 value (F1 score):
f1 value: and (5) harmonizing the average value. (ii) a
F1=2*P*R/(P+R)=2*0.914*0.931/(0.914+0.931)=0.922。
Specific results are shown in table 7.
TABLE 7 external verification results Table
Figure BDA0003411845020000191
The gray scale image is mapped using the error _ matrix. As shown in fig. 10, the type and number of errors in classification made by the algorithm are determined according to the brightness of the grayscale image, and in this embodiment, the diagonal brightness in the matrix is generally high, which indicates that the model prediction performance is good.
And step S511, predicting the disease type by using the disease prediction model qualified by performance evaluation.
A more specific embodiment of the present invention is as follows:
collecting fresh or properly frozen feces of people, putting dry ice in the feces for preservation within 30 minutes, and storing the feces in a refrigerator at-80 ℃ as soon as possible until intestinal metagenome sequencing is carried out;
extracting DNA, and performing quality control on the extracted nucleic acid substance by using an agarose gel method, wherein the total amount of the DNA is more than or equal to 1 mug, and the total concentration of the DNA is more than or equal to 20 ng/muL;
establishing a library for a sample with qualified quality, and carrying out double-end sequencing on the sample with the illumina hiseq 4000;
after obtaining original metagenome double-end sequencing data, performing quality control on the data by using Trimmomatic software, removing low-quality sequences and joints, and evaluating the data after quality control by using FastQC software;
performing metagenome species annotation analysis on the data after quality control by adopting MetaPhIAn2 software;
acquiring abundance information of species of the intestinal flora of the population;
adopting a machine learning method for modeling and a ten-by-ten cross validation method, randomly dividing data into a training set and a testing set, adopting a grid to search and adjust parameters, and selecting optimal parameters;
and (3) acquiring a batch of external data which never participate in modeling, using the constructed model for predicting the batch of data, and judging the quality of the predicted model through a confusion matrix.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for predicting risk of disease based on the rumen coccus microbiota comprising:
acquiring relative abundance information of the metagenome data of the excrement samples of the disease population and the health population;
determining feature data of the intestinal flora according to the relative abundance information and pre-screened biomarkers of the diseases, wherein the biomarkers of the diseases are pre-screened according to literature review and historical information of relative abundance of differential bacteria, and the historical information of relative abundance of the differential bacteria is obtained by performing difference analysis on the historical information of relative abundance of disease people and healthy people;
determining classification variables of disease people and healthy people;
inputting the characteristic data of the intestinal flora and the classification variables into a pre-established machine learning model for training to obtain a disease risk prediction model;
predicting the disease risk by using a disease risk prediction model;
wherein the disease includes inflammation, atherosclerosis, tumors, hypertension, diabetes, infection;
wherein the biomarker is a ruminococcus microbiota;
wherein the categorical variables include gender, age, antibiotic usage, smoking status, smoking history, country;
the machine learning model comprises a random forest model, a decision tree model and an Adaboost model.
2. The method of claim 1, wherein before the training of the machine learning model to obtain the disease risk prediction model by inputting the intestinal flora characteristic data and the classification variables into a pre-established machine learning model, the method further comprises:
screening the characteristic data of the intestinal flora to obtain sample data with the abundance of all rumen coccus microbiota being 0;
deleting sample data of which the abundances of all rumen coccus microbiota are 0 from the characteristic data of the intestinal flora;
and storing the characteristic data of the intestinal flora after the data deletion.
3. The method of claim 1 or 2, wherein before the training of the machine learning model established in advance on the basis of the feature data of the intestinal flora and the classification variables to obtain the disease risk prediction model, the method further comprises:
performing dummy variable transformation on the classification variables by using IBM SPSS statistics 23.0;
and storing the classification variables after the dummy variable transformation.
4. The method of any one of claims 1 to 3, wherein the disease risk prediction using the disease risk prediction model comprises:
adjusting parameters of the disease risk prediction model by using a grid search algorithm;
testing the disease risk prediction model after parameter adjustment by using the test data;
according to the test result, performing performance evaluation on the disease risk prediction model by using a confusion matrix;
and (5) performing disease risk prediction by using a disease risk prediction model qualified in performance evaluation.
5. The method for predicting disease risk according to any one of claims 1 to 4, wherein the pre-screened biomarkers of disease comprise:
Ruminococcus_gnavus;
Ruminococcus_obeum;
Ruminococcus_torques;
Ruminococcus_albus;
Ruminococcus_bromii;
Ruminococcus_callidus;
Ruminococcus_champanellensis;
Ruminococcus_flavefaciens;
Ruminococcus_lactaris;
Ruminococcaceae_Faecalibacterium_prausnitzii;
Ruminococcaceae_bacterium_D16;
Ruminococcaceae;
Ruminococcus。
6. a ruminococcus microbiota-based disease risk prediction device, comprising:
the data acquisition module is used for acquiring the relative abundance information of the metagenome data of the stool samples of the disease population and the health population;
the characteristic data determining module is used for determining the characteristic data of the intestinal flora according to the relative abundance information and the pre-screened biomarkers of the diseases, wherein the biomarkers of the diseases are pre-screened according to literature review and the historical information of the relative abundance of the differential bacteria, and the historical information of the relative abundance of the differential bacteria is obtained by performing difference analysis on the historical information of the relative abundance of the disease population and the historical information of the relative abundance of the healthy population;
the component variable determining module is used for determining classification variables of disease people and healthy people;
the model training module is used for inputting the intestinal flora characteristic data and the component variables into a pre-established machine learning model for training to obtain a disease risk prediction model;
the risk prediction module is used for predicting the disease risk by utilizing the disease risk prediction model;
wherein the disease includes inflammation, atherosclerosis, tumors, hypertension, diabetes, infection;
wherein the biomarker is a ruminococcus microbiota;
wherein the categorical variables include gender, age, antibiotic usage, smoking status, smoking history, country;
the machine learning model comprises a random forest model, a decision tree model and an Adaboost model.
7. The disease risk prediction device of claim 6, further comprising:
the parameter adjusting module is used for adjusting parameters of the disease risk prediction model by utilizing a grid search algorithm;
the test module is used for testing the disease risk prediction model after parameter adjustment by using test data;
the performance evaluation module is used for carrying out performance evaluation on the disease risk prediction model by using the confusion matrix according to the test result;
and the risk prediction module is also used for predicting the disease risk by using the disease risk prediction model qualified by the performance evaluation. Sample data with total rumen coccus microbiota abundance of 0
8. Use of a ruminococcus microbiota for the prediction of risk of disease.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a disease risk prediction method as claimed in any one of claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements a disease risk prediction method according to any one of claims 1 to 4.
CN202111533823.7A 2021-12-15 2021-12-15 Disease risk prediction device based on rumen coccus microbiota Active CN114283890B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111533823.7A CN114283890B (en) 2021-12-15 2021-12-15 Disease risk prediction device based on rumen coccus microbiota

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111533823.7A CN114283890B (en) 2021-12-15 2021-12-15 Disease risk prediction device based on rumen coccus microbiota

Publications (2)

Publication Number Publication Date
CN114283890A true CN114283890A (en) 2022-04-05
CN114283890B CN114283890B (en) 2023-04-07

Family

ID=80872568

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111533823.7A Active CN114283890B (en) 2021-12-15 2021-12-15 Disease risk prediction device based on rumen coccus microbiota

Country Status (1)

Country Link
CN (1) CN114283890B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115691813A (en) * 2022-12-30 2023-02-03 神州医疗科技股份有限公司 Genetic gastric cancer assessment method and system based on genomics and microbiomics
JP7270143B1 (en) 2022-05-30 2023-05-10 シンバイオシス・ソリューションズ株式会社 Disease evaluation index calculation system, method and program
CN116344040A (en) * 2023-05-22 2023-06-27 北京卡尤迪生物科技股份有限公司 Construction method of integrated model for intestinal flora detection and detection device thereof
TWI826332B (en) * 2023-06-08 2023-12-11 宏碁股份有限公司 Method and system for establishing disease prediction model
CN117789981A (en) * 2023-12-26 2024-03-29 康美华大基因技术有限公司 Fatty liver risk prediction method, device, system and storage medium
CN117854720A (en) * 2023-12-06 2024-04-09 广州达安临床检验中心有限公司 Autism risk prediction device and computer equipment based on fungus genus characteristic
CN118888078A (en) * 2024-07-08 2024-11-01 重庆医科大学附属儿童医院 A drug combination risk management system based on knowledge graph

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060269534A1 (en) * 2005-05-31 2006-11-30 The Iams Company Feline probiotic bifidobacteria
CN111430027A (en) * 2020-03-18 2020-07-17 浙江大学 Biomarkers of bipolar disorder based on gut microbes and their screening applications
CN112435756A (en) * 2020-11-30 2021-03-02 武汉益鼎天养生物科技有限公司 Intestinal flora associated disease risk prediction system based on mutual evidence of multiple data set differences
CN112509700A (en) * 2021-02-05 2021-03-16 中国医学科学院阜外医院 Stable coronary heart disease risk prediction method and device
CN112509701A (en) * 2021-02-05 2021-03-16 中国医学科学院阜外医院 Risk prediction method and device for acute coronary syndrome
CN112582029A (en) * 2020-12-14 2021-03-30 李志军 Analysis method for diversity of intestinal flora in acute lung injury
CN113186310A (en) * 2021-04-23 2021-07-30 复旦大学附属中山医院 Method for predicting healthy aging through relative abundance of intestinal flora
CN113380396A (en) * 2020-02-25 2021-09-10 深圳市奇云生物信息科技有限公司 Method for evaluating risks of multiple intestinal diseases based on fecal microbial markers and human DNA content and application

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060269534A1 (en) * 2005-05-31 2006-11-30 The Iams Company Feline probiotic bifidobacteria
CN113380396A (en) * 2020-02-25 2021-09-10 深圳市奇云生物信息科技有限公司 Method for evaluating risks of multiple intestinal diseases based on fecal microbial markers and human DNA content and application
CN111430027A (en) * 2020-03-18 2020-07-17 浙江大学 Biomarkers of bipolar disorder based on gut microbes and their screening applications
CN112435756A (en) * 2020-11-30 2021-03-02 武汉益鼎天养生物科技有限公司 Intestinal flora associated disease risk prediction system based on mutual evidence of multiple data set differences
CN112582029A (en) * 2020-12-14 2021-03-30 李志军 Analysis method for diversity of intestinal flora in acute lung injury
CN112509700A (en) * 2021-02-05 2021-03-16 中国医学科学院阜外医院 Stable coronary heart disease risk prediction method and device
CN112509701A (en) * 2021-02-05 2021-03-16 中国医学科学院阜外医院 Risk prediction method and device for acute coronary syndrome
CN113186310A (en) * 2021-04-23 2021-07-30 复旦大学附属中山医院 Method for predicting healthy aging through relative abundance of intestinal flora

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
胡良平: "提高回归模型拟合优度的策略(Ⅰ)———哑变量变换与其他变量变换", 《四川精神卫生》 *
陈卫: "肠道菌群: 膳食与健康研究的新视角", 《食品科学技术学报》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7270143B1 (en) 2022-05-30 2023-05-10 シンバイオシス・ソリューションズ株式会社 Disease evaluation index calculation system, method and program
WO2023234188A1 (en) * 2022-05-30 2023-12-07 シンバイオシス・ソリューションズ株式会社 Disease evaluation indicator calculation system, method, and program
JP2023175142A (en) * 2022-05-30 2023-12-12 シンバイオシス・ソリューションズ株式会社 Disease evaluation index calculation system, method, and program
CN115691813A (en) * 2022-12-30 2023-02-03 神州医疗科技股份有限公司 Genetic gastric cancer assessment method and system based on genomics and microbiomics
CN116344040A (en) * 2023-05-22 2023-06-27 北京卡尤迪生物科技股份有限公司 Construction method of integrated model for intestinal flora detection and detection device thereof
CN116344040B (en) * 2023-05-22 2023-09-22 北京卡尤迪生物科技股份有限公司 Construction method of integrated model for intestinal flora detection and detection device thereof
TWI826332B (en) * 2023-06-08 2023-12-11 宏碁股份有限公司 Method and system for establishing disease prediction model
CN117854720A (en) * 2023-12-06 2024-04-09 广州达安临床检验中心有限公司 Autism risk prediction device and computer equipment based on fungus genus characteristic
CN117789981A (en) * 2023-12-26 2024-03-29 康美华大基因技术有限公司 Fatty liver risk prediction method, device, system and storage medium
CN117789981B (en) * 2023-12-26 2025-01-14 康美华大基因技术有限公司 Fatty liver risk prediction method, device, system and storage medium
CN118888078A (en) * 2024-07-08 2024-11-01 重庆医科大学附属儿童医院 A drug combination risk management system based on knowledge graph

Also Published As

Publication number Publication date
CN114283890B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN114283890A (en) A disease risk prediction method and device based on Ruminococcus microbiota
van Doorn et al. A comparison of machine learning models versus clinical evaluation for mortality prediction in patients with sepsis
Forstmeier et al. Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner's curse
Head et al. The extent and consequences of p-hacking in science
Yildirim et al. Classification with respect to colon adenocarcinoma and colon benign tissue of colon histopathological images with a new CNN model: MA_ColonNET
Degasperi et al. Evaluating strategies to normalise biological replicates of Western blot data
Chuard et al. Evidence that nonsignificant results are sometimes preferred: Reverse P-hacking or selective reporting?
Smith et al. Accounting for the complex hierarchical topology of EEG phase-based functional connectivity in network binarisation
Delaigle et al. Nonparametric regression with homogeneous group testing data
Tsiklidis et al. Using the National Trauma Data Bank (NTDB) and machine learning to predict trauma patient mortality at admission
CN114093512B (en) Survival prediction method based on multi-mode data and deep learning model
Inaguma et al. Increasing tendency of urine protein is a risk factor for rapid eGFR decline in patients with CKD: A machine learning-based prediction model by using a big database
CN112509701A (en) Risk prediction method and device for acute coronary syndrome
Meera et al. Towards a data-driven approach to screen for autism risk at 12 months of age
Petersen et al. Repeated measurements of blood lactate concentration as a prognostic marker in horses with acute colitis evaluated with classification and regression trees (CART) and random forest analysis
CN117058484A (en) Training method, device, system and electronic device of cell detection model
Pezzano et al. CoLe-CNN+: Context learning-Convolutional neural network for COVID-19-Ground-Glass-Opacities detection and segmentation
Liu et al. Prediction of acute kidney injury in patients with femoral neck fracture utilizing machine learning
Khitan et al. Predicting adverse outcomes in chronic kidney disease using machine learning methods: data from the modification of diet in renal disease
CN111145852B (en) Medical information processing method and device and computer readable storage medium
Baral et al. Redefining lobe-wise ground-glass opacity in COVID-19 through deep learning and its correlation with biochemical parameters
CN111276248A (en) State determination system and electronic device
WO2023230228A1 (en) Systems and methods for identification of structural variants based on an autoencoder
US20220328132A1 (en) Non-invasive methods and systems for detecting inflammatory bowel disease
Moraes et al. Predicting age from human lung tissue through multi-modal data integration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant