Disclosure of Invention
The invention mainly aims to provide a method for establishing an early warning model of the co-morbid state and the emotional disturbance of the chronic heart failure patients, which establishes the early warning model of the co-morbid state and the emotional disturbance of the chronic heart failure patients by utilizing a machine learning technology and big data processing and provides a feasible basis for early intervention of the co-morbid state and the emotional disturbance of the chronic heart failure patients.
To achieve the above object, according to an aspect of embodiments of the present invention, there is provided a method for establishing an early warning model of a co-morbid state and a mood disorder of a patient with chronic heart failure, the method comprising the steps of:
a. constructing a depression and anxiety risk index system for chronic heart failure patients;
the method comprises the following steps of basic information collection and scale evaluation:
the basic information comprises social and demographic data, disease related information, biochemical indexes and physical examination results;
the scales include a depression self-rating scale, an anxiety self-rating scale, and a minnesota heart failure quality of life questionnaire;
b. constructing a risk comprehensive assessment and early warning model of the co-morbid state and emotional disorder of the chronic heart failure patient;
the method comprises the steps of data acquisition and early warning model construction;
the data acquisition comprises data source selection, data selection and data processing;
the construction of the early warning model comprises the steps of taking a certain percentage of total samples as a sample set, and establishing the early warning model through machine learning and big data processing.
Further, the social demographic data comprises the age, sex, cultural degree, marital status, smoking history, eating habits, exercise habits, income level, social support of the patient; the disease-related information comprises height, weight and body mass index of the patient; the disease course of heart failure, the history of major cardiovascular diseases and other non-cardiovascular diseases and the grading of heart functions; the biochemical indicators and physical examination results comprise BNP, hemoglobin, total bilirubin, total cholesterol, high-density lipoprotein cholesterol, heart rate variability, left ventricular inside diameter, left atrial inside diameter, left ventricular ejection fraction; b-type natriuretic peptide, amino-terminal pro-brain natriuretic peptide, hypersensitive C-reactive protein, and soluble ST 2.
Further, the data source comprises a medical record file and a historical data record; the data selection is to perform interactive query and data analysis mainly by using a Spark platform, and an Oracle database is used as an auxiliary probe and data verification; the data processing adopts functional operation programming of a kernel abstract data set Rdd of Spark and structured programming of a Date Frame object in a Spark SQL component, and realizes the data extraction task in Spark SQL language.
Further, the data processing further comprises data preprocessing, data cleaning, data integration, data transformation and data specification, and relevant data are screened out.
Further, the early warning model is specifically constructed by taking 70% of total samples as a sample set, based on chronic heart failure common illness state disorder events, utilizing Spark mlib library, respectively adopting Logistic regression and naive Bayes classification model algorithms, selecting an optimal model according to AUC value, and establishing a risk early warning model with the time length of 1 year.
Further, the construction of the early warning model comprises the steps of constructing a multi-factor Logistic regression model, solving the posterior maximum probability through Bayesian theorem, determining a risk early warning critical point and formulating an early warning mechanism.
According to the technical scheme of the invention and the technical scheme of further improvement in certain embodiments, the invention has the following beneficial effects:
aiming at the risk factors, the invention screens out main risk factors by adopting data mining and establishes a strict risk early warning system, can achieve early discovery and effective prevention, and has guiding function and practical significance for preventing and controlling emotional disorders of patients with chronic heart failure.
Detailed Description
It should be noted that the specific embodiments, examples and features thereof may be combined with each other in the present application without conflict.
In order to make the technical solution of the present invention better understood, the technical solution of the present invention will be clearly and completely described below with reference to the specific embodiments and examples of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments and examples obtained by a person skilled in the art without any inventive step should fall within the protection scope of the present invention.
The method for establishing the early warning model of the co-morbid state and mood disorder of the chronic heart failure patient can realize quick and early evaluation and warning of the mood disorder risk of the patient, thereby carrying out early prevention and intervention. The technical scheme comprises the following steps: constructing a depression and anxiety risk index system for patients with chronic heart failure; ② the risk comprehensive evaluation and early warning model construction of chronic heart failure patients with illness state and mood disorder.
Construction of an index system for depression and anxiety risk of patients with chronic heart failure
The incidence and influence factor research of depression and anxiety of chronic heart failure patients. Patients meeting the study inclusion standard are subjected to basic information collection and scale evaluation after the stable state of the disease on days 1-2 after admission.
Social demographic data a, including: a1-patient age, A2-gender, A3-cultural degree, A4-marital status, A5-smoking history, A6-dietary habit, A7-exercise habit, A8-income level, A9-social support;
disease-related information B, including: b1-height, B2-weight, B3-body mass index; b4-course of heart failure, B5-history of major cardiovascular disease, B6-history of other non-cardiovascular disease, B7-grading of cardiac function;
biochemical indicators and physical examination results C, including: c1-type B natriuretic peptide (B-type natrietic peptide, BNP, C2-hemoglobin, C3-total bilirubin, C4-total cholesterol, C5-high density lipoprotein cholesterol, C6-heart rate variability, C7-left ventricular internal diameter, C8-left atrial internal diameter, C9-left ventricular ejection fraction, C10-E/E', C11-amino terminal brain natriuretic peptide precursor (N-terminal-proBNP, NT-proBNP), C12-hypersensitive C reactive protein (hs-CRP), C13-soluble ST2(soluble ST2, s ST 2);
depression self-rating scale D;
anxiety self-rating scale F;
minnesota quality of life heart failure questionnaire G.
And (4) statistically processing all data by adopting SPSS23.0 statistical software to obtain the risk factors of the emotional disorder of the chronic heart failure patient.
Secondly, constructing an index system of depression and anxiety risks of patients with chronic heart failure.
The method comprises the steps of firstly carrying out discretization classification on patient data, then constructing a decision table, and finally carrying out attribute reduction on the decision table based on an information entropy algorithm to screen out risk indexes.
(1) Discretizing classification, namely setting dis as (Fmax-Fmin)/3, wherein Fmax corresponds to the maximum value of the indexes, and Fmin corresponds to the minimum value of the indexes; indexes (including A2, A3, A4, A5, A6, A7, A8, A9, B5, B6 and B7) to be discretized are divided into 3 types, wherein 1 represents (Fmin +2 x dis, Fmax), 2 represents (Fmin + dis, Fmin +2 x dis) and 3 represents (Fmin, Fmin + dis).
(2) The method comprises the steps of constructing a decision table, wherein the decision table comprises an object set (research samples, U & ltX & gt 1, X & ltX & gt 2 & gt. & gt, Xn, wherein X1, X & ltX & gt 2 & gt. & gt, Xn represents each sample in the samples), an attribute set (condition attributes C & ltF & gt 1, F2 & gt. & gt, Fn, namely risk indexes, F1, F2 & ltF & gt. & gt, and Fn represent each index of the samples, namely an A-G index, and a decision attribute set D & ltD & gt 1, D2 & ltright. & gt, Dn, namely whether the research objects to be participated in are depressed and anxious or not, D1, D2 & ltD & gt, and Dn represents whether each research object is depressed or anxious or not, namely ending, and n is the number of the samples.
(3) Decision table attribute reduction: from the original variable set (including all data from a to G), variables with null rate (number of nulls/total number) > 80% were excluded and the remaining variables were reduced.
(4) And (3) inviting 15 experts with more than 10 years of working experience, academic records and higher titles in relevant working fields of chronic disease management, cardiovascular diseases, psychological consultation, nursing management and the like to participate in the study of the depression and anxiety risk indexes of the chronic heart failure patients in a field interview mode to finally establish the index content.
Examples
Establishment of chronic heart failure patient common state of illness obstacle early warning model
And (6) data acquisition and processing. a. The data source is as follows: the historical data of hospital case history files such as a third-level medical record in Sichuan province and the like relates to 5 ten thousand, and the total number of the medical data files exceeds 9000 ten thousand medical data records. b. Data selection: according to the characteristics of big data, interactive query and data analysis are carried out by taking a Spark platform as a main part, and an Oracle database is used as an auxiliary probe and data verification. In the research, the data processing mainly takes chronic heart failure and emotional disorder caused by the chronic heart failure as research targets, and medical knowledge as guidance, and field attributes irrelevant to mining targets are discarded. c. Data processing: and adopting functional operation programming of a kernel abstract data set Rdd of Spark and structured programming of a Date Frame object in a Spark SQL component, and realizing a data extraction task in Spark SQL language. And screening out data relevant to the research through data preprocessing, data cleaning, data integration, data transformation and data protocols.
Constructing an early warning model: taking 70% of the total sample as a sample set, based on chronic heart failure common-illness-condition-disorder events, selecting an optimal model by utilizing Spark Mllib library and respectively adopting Logistic regression, and establishing a 1-year-long risk early warning model.
And (3) constructing a multi-factor Logistic regression model, namely taking whether depression and anxiety occur as dependent variables, and taking the risk index of the co-morbid state and mood disorder of the chronic heart failure patient as an independent variable to be included in the Logistic regression model, so as to estimate the regression coefficient, the relative risk and a 95% confidence interval of each risk factor.
Classifying the risk factors and determining the value of each classification, namely cutting and grouping continuous variables, and taking a group median as a reference value; the classification variable sets a dummy variable, coded 0, 1.
And determining the basic risk reference values of all the risk factors, wherein if a certain risk factor of the patient takes the value as the value, the risk is scored as 0, and the higher the score is, the higher the risk is.
The distance D of each risk factor's classification from the underlying risk is calculated as measured in units of a regression coefficient multiplied by the group spacing.
A unit distance B of 1 minute is set.
The classification of each risk factor is scored as D/B.
Determining risk early warning critical point and making early warning mechanism
Determining the maximum johnson index through sensitivity and specificity to determine the critical points of occurrence of depression and anxiety, selecting 3 critical points (a, b and c) in the example, and finally determining the prediction probability of the occurrence of depression and anxiety through a Delphi method to achieve the score of a as low-risk early warning, b as medium-risk early warning, c as high-risk early warning and the intervention method of each early warning grade.
The expert screening standard comprises that the relevant working fields of chronic disease management, cardiovascular diseases, psychological consultation, nursing management and the like have more than 10 years of working experience, academic records and higher vocational titles and have higher enthusiasm for the project, and the expert screening standard can feed back the function inquiry result and answer the problem in time; the number of the experts is more than or equal to 15, and the early warning mechanism is formulated through 2-4 rounds of expert consultation.
Three indexes of sensitivity, specificity and john's index are indexes for evaluating the authenticity of the screening method.
The sensitivity is also called sensitivity, and refers to the proportion of patients which can be correctly judged by a screening method;
specificity is the proportion of people who are actually non-sick and can be correctly judged as non-sick by a screening method;
youden index (Youden index), also called the correct index, is a method for evaluating the authenticity of screening tests, and can be applied when the harmfulness of false negatives (missed diagnosis rate) and false positives (misdiagnosis rate) is assumed to be of equal significance. The jotan index is the sum of sensitivity and specificity minus 1. Indicating that the screening method finds true patient and non-patient total ability. The larger the index, the better the screening experiment and the greater the authenticity.