CN109461480B - Incremental updating method for hospital infection data loss - Google Patents
Incremental updating method for hospital infection data loss Download PDFInfo
- Publication number
- CN109461480B CN109461480B CN201811219378.5A CN201811219378A CN109461480B CN 109461480 B CN109461480 B CN 109461480B CN 201811219378 A CN201811219378 A CN 201811219378A CN 109461480 B CN109461480 B CN 109461480B
- Authority
- CN
- China
- Prior art keywords
- sample
- data
- patient
- date
- infection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/60—ICT specially adapted for the handling or processing of medical references relating to pathologies
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses an incremental updating method of hospital infection data, which updates missing values of the hospital infection data, incrementally updates different characteristics according to effective time ranges of data characteristics, and generates a sample set suitable for analysis and modeling of the hospital infection data. The beneficial effects are as follows: the method solves the problem of hospital infection data loss through incremental updating, and provides a method for classifying different characteristics according to an effective time range to solve the problem of different time effectiveness lengths of different characteristics.
Description
Technical Field
The invention relates to a hospital infection data mining technology, in particular to an incremental updating method for hospital infection data loss in the analysis and modeling process of hospital infection big data.
Background
In the field of hospital infection, a large amount of economic loss and casualties are caused by hospital infection every year, analysis and modeling of hospital infection data are relatively troublesome in analysis and modeling of medical data, the quality of the hospital infection data is poor, the difficulty in sample construction is high, and no good precedent is provided for data analysis and modeling as guidance. In recent years, each large hospital begins to establish its own nosocomial infection monitoring information system, however, these monitoring and early warning systems are not good and have poor effects, which causes these problems mostly because the difficulty of analysis and modeling of the large data of nosocomial infection is large, and cases which are not successful are used as guidance and reference, and each case is used for solving a small part of problems and is difficult to comprehensively explain and analyze the difficulty of nosocomial infection modeling. Solutions to modeling data have also been proposed in literature, but various problems exist.
For example, the literature (forest pine, queen culture, liuwei, etc.; medical examination data preprocessing method research [ J ]. computer application research, 2017,34(4): 1089-.
For another example, the literature (Kotsiantis S B, Kanelopoulos D, Pintalas P E. data preprocessing for collaborative learning [ J ]. International Journal of Computer Science,2006,1(2):111-, for missing values, the use of means, special values, etc. may be used, however, for modeling purposes, these methods are not well suited because the ultimate goal of modeling is to provide early warning or real-time monitoring of hospital infected patients, and most importantly, to provide early warning basis for the patients who are finally early warned, these bases are generally intended to show the patient's true values rather than processed values, which is convenient for the physician to make a reasonable diagnosis, so that direct modification of values or use of special values is less suitable in this case.
For another example, a time sequence model is provided in the application research of an autoregressive moving average mixed model in hospital infection incidence prediction in documents (plum red, Liangpei Feng, Pandong peak, and the like) [ J ] China Hospital Infection journal, 2013,23(11):2693 ], and the time sequence model can monitor the development trend of hospital infection and aims at early warning and reducing the risk of hospital infection. However, the early warning model has two obvious disadvantages, one is that the model indirectly monitors the incidence of nosocomial infection, which generally belongs to retrospective research afterwards, is difficult to monitor in advance and in real time, and cannot intervene and treat nosocomial infection in time, the other is that the model belongs to a formula-type calculation model, has no interpretability and is difficult to analyze reasons, and data used by the model is established based on Ningxia people hospitals, is not subjected to a large number of tests of other hospitals, and remains to be checked if the model has generalizability.
In the process of carrying out the analysis modeling of the hospital infection big data, the encountered difficulties mainly include the following:
(1) the problem of hospital infection data loss. The hospital infection data has the characteristic of timeliness, and the characteristic determines that the time range of the detection data of the patient must be considered when the data is used, but the hospital infection data has the defect problem, so that the difficulty of analyzing and modeling the hospital infection big data is increased;
(2) hospital infection data problem of positive and negative sample division. The hospital infection data samples are mainly divided into two types, one type is an infection sample, the other type is a non-infection sample, and how to divide the two types of samples into positive and negative examples is a more important problem. However, the practical problem is complex, the non-infection sample is easy to obtain, only several days of data are randomly extracted from the patients without hospital infection as the non-infection sample, the selection of the infection sample has a difficulty that most of the patients with hospital infection have long hospitalization time, the possibility of being in the infection state is only a period of time, and other periods of time are normal, and how to obtain the data of the infection state is difficult. In hospital infections, patients who have been diagnosed or reported as hospital infections generally have an "infection date" diagnosed by a doctor, herein referred to as "diagnosis date", for determining that the patient has had an infection that the day, and the simplest way is to take the day of the "diagnosis date" as an infection sample, however, in actual investigations, this date is an inferred date of the doctor, and most of them are inaccurate, and the date on which the patient actually had an infection may be before this date or after this date, and is not very strict in date grasping, and similar problems have been explained in the literature (Zhang Wei, Meng Hui, Zheng Jia, etc.. in the study of different statistical methods of hospital infection miss-reporting rates [ J ] in China Hospital infectivity, 2006, 1.).
Therefore, there is a need in the art for improvements in the analysis and modeling of hospital infection big data that addresses the above-mentioned deficiencies.
Disclosure of Invention
In view of the above-mentioned defects in the prior art, the technical problem to be solved by the present invention is to provide an incremental updating method for hospital infection data loss to solve the problem of hospital infection data loss in the analysis and modeling process of hospital infection big data.
Before proceeding with the summary of the invention, it is necessary to explain and define terms appearing in the document.
Effective time range: for example, if the data such as body temperature, stool frequency, heart rate and respiratory rate have high timeliness and basically have differences every day, the data can be used for 24 hours, the data exceeding 24 hours can be not considered to be used, the data such as microbiological examination and laboratory examination have low timeliness, and the data within three to five days can be considered to be effective, so that the data can be used for 72 or 120 hours, and the range is referred to as an "effective time range". The effective time range is generally determined according to experience or data in a reference document, and can also be established according to actual modeling purposes, and the standard refers to the action time of part of characteristics in the hospital infection diagnosis standard (trial).
The diagnosis date: in a nosocomial infection, a patient who has been diagnosed or reported as a nosocomial infection typically has a "date of infection" that is diagnosed by a physician, referred to herein as the "date of diagnosis".
The infection date: the date when the patient actually developed the infection was the date of infection.
The previous time period: selecting an infection sample by taking the diagnosis date as a reference date, and taking the time unit length of the previous inference as a previous time period.
And (3) a later time period: and selecting an infection sample by taking the diagnosis date as a reference date, and reasoning the later time period according to the later time unit length.
In order to solve the above problems, the present invention provides an incremental updating method for hospital infection data loss, comprising the following steps:
step 2, recording a set composed of all patients as S, obtaining a patient m in the set S, and generating a positive and negative sample set N for the patient m;
step 3, sequencing the positive and negative sample sets N in the step 2 in an ascending order according to a sequence of time from front to back so as to ensure that the time is arranged from front to back in the incremental updating process, thereby ensuring that a new value always covers an old value during updating;
step 4, storing the sample i with the earliest time in the sample set N into a sample set D, correspondingly storing the sample i into a set T according to the characteristics of the hospital infection data determined in the step 1, and respectively recording Tk _ v and Tk _ date which represent the value of the kth characteristic in the set T corresponding to the sample i in the set N and the date of the value;
step 5, carrying out missing value judgment on the second and all the subsequent samples i in the sample set N, updating the missing values, and reserving the un-missing values;
step 6, storing the updated or reserved samples into a sample set D, reading subsequent samples according to the sequence of the step 5 and storing sample data;
and 7, when the step 5 and the step 6 are repeated to obtain that i is equal to N, the reading is completed, and the sample set D is constructed.
It should be noted that the positive example in step 2 is a patient m sample with nosocomial infection, and the negative example is a patient m sample without nosocomial infection.
Further, if m is the patient in the positive example sample, then m is recorded as the mth patient in S; if m is the patient in the counterexample sample, then m is the randomly drawn patient.
Further, after the positive and negative example sample set N is generated in step 2, the positive and negative example sample set N needs to be divided by adopting a time period inference method to distinguish samples in an infection state, and the specific steps include:
step 2a, recording a set of hospital infected patients as C, and recording a set of diagnosis dates of the infected patients as Cd;
step 2b, randomly extracting n patients from the set C and obtaining diagnosis dates corresponding to the n patients;
step 2c, diagnosing the n patients in the step 2b, and obtaining arrays A _ pre and A _ end consisting of data of 'before time period' and 'after time period' of the n patients;
step 2d, summing the two arrays in step 2c, and then averaging to obtain two average values avg _ pre ═ sum (a _ pre)/n and avg _ end ═ sum (a _ end)/n; these two averages serve as two parameters for time period inference for all patients in set C, approximating the "previous time period" and "subsequent time period" for all patients in set C;
and 2e, fine tuning the avg _ pre and avg _ end through testing to obtain a final required value.
Further, in step 5, if the value Tk _ v of the feature Tk of the sample i is a missing value, the values Tk _ v and Tk _ date of the feature Tk are found in the sample set D in reverse order, and if the value in the sample set D is not null and the difference between Tk _ date and Tk _ date in i does not exceed "valid time range", the value is taken out and updated to Tk _ v of the sample i to replace the missing value, where reverse-order traversal is required to ensure that the traversed sample in the set D is always closest to the current sample in time, and the following is the same.
Further, in step 5, if the value Tk _ v of the feature Tk of the sample i is a missing value, the values Tk _ v and Tk _ date of the feature Tk are found in the sample set D in reverse order, and if the value in the sample set D is not empty but exceeds the "valid time range", the missing state of traversing and maintaining the kth feature of the sample i is deduced.
Further, in step 5, if the value Tk _ v of the feature Tk of the sample i is a missing value, the values Tk _ v and Tk _ date of the feature Tk are found in the sample set D in reverse order, and if the value in the sample set D is null, the next value is continuously traversed.
The invention also provides an analysis modeling method for solving the problem of hospital infection data loss through an incremental updating method, which comprises the following steps:
step A1, determining the characteristics of hospital infection data, and classifying the characteristics according to an effective time range;
step A2, determining patients generating positive and negative samples, wherein the positive sample is a patient sample with nosocomial infection, and the negative sample is a patient sample without nosocomial infection;
step A3, dividing positive and negative examples by adopting a time period reasoning mode, wherein the specific implementation mode is as described in the steps 2a to 2 e;
step A4, generating a sample set by using an incremental update method, the specific implementation manner is as described in the foregoing steps 1-7;
step a5, analytically modeling the final sample set.
The invention also provides an analysis modeling system for solving the problem of hospital infection data loss based on the incremental updating method, which at least comprises a database, wherein the database stores all patient sets S and case data of the patients in the sets S; a sample generating module, which generates a sample set according to the sample generating conditions, such as generating an infected patient set and a non-infected patient set according to the infection condition of the patient; the sample dividing module is used for dividing the sample set generated by the sample generating module into a sample set required by analysis and modeling; and the data updating module realizes the updating of the missing data value through the steps 1 to 7.
The invention also provides an implementation method of the analysis modeling system for solving the problem of hospital infection data loss based on the incremental updating method, which comprises the following steps:
step B1, according to the information of the database, the patient data items needed in the hospital infection data are sorted and defined and a corresponding XML storage structure is designed;
b2, the sample generating module arranges the patient data into the sample format of the needed data according to the set sampling period and the data item as the characteristic, and generates the needed sample set;
in step B2, the nosocomial infection data is arranged into samples, each of which is the data of one patient in a set sampling period, and the features in the samples are incrementally updated according to the incremental updating method described above, so as to finally generate a sample set consisting of a plurality of samples of patients in the set sampling period.
Step B3, the sample dividing module divides the sample set according to the finally classified labels to generate a sample set after the final infection sample and the non-infection sample are distinguished;
step B4, the divided sample set is updated incrementally through a data updating module;
and step B5, after the sample set is updated, establishing a model according to a general modeling method.
Further, in step B1, the file is stored in an XML manner, and the file includes basic information of the patient, such as case number, sex, age, infection date, etc., basic information of the patient's admission, such as admission diagnosis, admission department, admission date, etc., and information of the patient setting a sampling period during the admission, such as body temperature, medical order, laboratory examination, microbiological examination, imaging examination, and disease course record, etc.; the storage scheme has the function of storing the information of the patient, is mainly convenient for organizing and applying the data, each item in the XML can be independently taken out and combined with other items for use, each item in the XML has accurate time, and can be organized according to the time sequence, and the use mode depends on the requirements of developers.
The present invention also provides a computer-readable medium for incremental data update and hospital infection data analysis and modeling over a computer network, comprising a set of instructions that, when executed, cause at least one computer to perform the steps of solving the problem of incremental updates in the hospital infection data analysis modeling process and utilizing the updated data analysis and modeling.
By implementing the incremental updating method for hospital infection data deletion provided by the invention, the method has the following technical effects:
(1) the incremental updating mode is adopted to solve the problem of data missing or real-time data utilization. In the prior method for processing hospital infection missing data and real-time data, the missing value of the data is mostly evaluated, and the samples with more missing are directly deleted and are not used any more, so that the method is not reasonable, because the missing values are more, but few values have reference values if the real-time data is, and the method adopts incremental updating to basically solve the problem of the missing of most data.
(2) The method for classifying different characteristics according to the effective time range solves the problem of different time effectiveness lengths of different characteristics.
(3) The problem that samples are difficult to divide due to inaccurate infection dates is solved by adopting a time period reasoning mode. In the prior art, the hospital infection samples are divided by patient units, and the acquisition of infection data requires a large amount of manual examination, and the method divides the infection samples and the non-infection samples by day units and distinguishes the two types of samples by a time slot mode, so that the problems that the samples are difficult to select and divide are solved.
(4) The data is stored in an XML mode, and the problem that hospital data is complex and difficult to utilize is solved. In the prior method for processing hospital infection data, most of the data derived through a database and related programs are processed and analyzed, and a relatively universal data structure is not designed for data storage and processing separately. The method has the advantages of convenient storage and processing, and can manage the data in a patient unit, and all specific information of each patient is collected into one file, so that the method is favorable for data management, is convenient for research and development workers to retrospectively research the data, and is greatly convenient for data application.
(5) The basic flow and a plurality of difficulties of the analysis and modeling of the big data of the hospital infection are described more clearly, and the basic idea is cleared for the analysis and modeling of the infection data of the hospital.
Drawings
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
FIG. 1 is a flow chart for modeling analysis of nosocomial infection data in an embodiment of the present invention;
FIG. 2 is a flow chart of time period inference in an embodiment of the invention;
FIG. 3 is a flow chart of incremental update in an embodiment of the present invention;
FIG. 4 is a flow chart of a method for implementing the analytical modeling system in an embodiment of the present invention;
FIG. 5 shows the effect times of some of the features in the criteria for diagnosis of infection in hospitals (trial).
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the following embodiments, the term "effective time range" refers to a general term for the aging time range of the patient/patient test data, for example, when the aging performance of the data such as body temperature, stool frequency, heart rate and respiratory rate is high, and the data is basically different every day, the data can be used in 24 hours as a range, the data exceeding 24 hours is not considered, and the data in three days to five days can be considered to be effective, such as the microorganism examination and laboratory examination, and the data can be used in 72 hours or 120 hours as a range, which is called the "effective time range". The effective time range is generally determined according to experience or data in a reference document, and can also be established according to actual modeling purposes, and the standard refers to the action time of part of characteristics in the hospital infection diagnosis standard (trial).
"date of diagnosis" means that a patient who has been diagnosed or reported as a nosocomial infection will generally have a "date of infection" diagnosed by a doctor, and is referred to herein as the "date of diagnosis".
By "date of infection" is meant the date on which the patient actually developed the infection.
The "previous time period" refers to a time unit length obtained by previously deducing the time unit length of an infection sample selected by taking the diagnosis date as a reference date.
The "posterior period" refers to a time unit length after the infection sample is selected with the diagnosis date as the reference date and is inferred afterwards as the posterior period.
Fig. 1 shows a modeling process for analyzing hospital infection data, which comprises the following steps:
step A1, determining the characteristics of hospital infection data, such as body temperature, pulse, C-reactive protein and the like, forming a characteristic set of hospital infection, which is marked as F, wherein k represents the kth characteristic in the set F; classifying the feature set F according to an effective time range to generate a set T, wherein Tk represents the category of the kth feature;
the purpose of the "effective time range" is that the time length of the effect of different characteristics on human body is different, and is generally determined according to experience or data in reference documents, and can also be established according to actual modeling purpose, and the standard suggestion refers to fig. 5, wherein the determination of the characteristic set mainly depends on part of characteristics summarized in hospital infection diagnosis standards (trial) and part of characteristics obtained from papers or doctors, and the part of work is mainly completed in the stage of demand investigation and analysis.
Step A2, determining patients generating positive and negative samples, wherein the positive samples are patient samples with hospital infection and the negative samples are patient samples without hospital infection; firstly, patients with hospital infection need to be obtained, the part is easy to obtain, because the patients with hospital infection have hospital diagnosis or are reported, the part of patients and the diagnosis date corresponding to the part of patients can be directly obtained, then, the patients without hospital infection can be obtained as the patients who are not diagnosed as hospital infection in the hospital, because the part of patients are more, a mode of combining stratified sampling and random sampling is adopted, the method is to stratify the patients in the hospital according to departments, then, each layer extracts part of patients in a random sampling mode, and the number of the finally extracted patients is generally not more than 10 times of the number of the patients with hospital infection;
it should be noted that this step is used to determine which patients are nosocomial infections and which are non-nosocomial infections, and these are not samples used for modeling, because a patient is not suitable as a sample, each patient is in an infected state for some time during the stay, and other times are normal, and only the time in which the patient is in an infected state can be used as an infected sample, i.e., the samples are time-series in nature.
Step A3, dividing positive and negative example samples by adopting a time period reasoning mode; after determination of the hospital-infected patient and the non-hospital-infected patient, positive and negative samples can be generated in time series. In the case, samples are mainly generated in units of days, so that each patient can be used as a sample every day in a hospital period, however, the samples are not generated according to data of each day in the hospital period of the patient, for the patient infected by the hospital, data of some days in the hospital period of the patient can be extracted in a random sampling mode, for the patient infected by the hospital, data of a corresponding time period can be extracted by applying a time period reasoning method, wherein a front time period and a rear time period of the time period reasoning need to try to find reasonable values for a plurality of times when positive and negative sample division is carried out, and two time periods generally suggest no more than 5 days; the process of using temporal reasoning is shown in fig. 2, and includes:
step A3a, recording a set of hospital infection patients as C, and recording a set formed by diagnosis dates as Cd;
step A3b, randomly extracting n patients from the set C, and obtaining diagnosis dates corresponding to the n patients;
step A3C, further diagnosing the n patients according to the hospital infection diagnosis criteria (trial), obtaining arrays a _ pre and a _ end composed of "preceding time period" and "following time period" of the n patients, respectively summing and re-averaging the two arrays of the n patients, obtaining average values of the two sets of values, namely avg _ pre ═ sum (a _ pre)/n and avg _ end ═ sum (a _ end)/n, and using the average values as two parameters for time period inference of all patients C;
step A3d, generating a sample set by adopting an incremental updating method and carrying out modeling test;
and step A3e, continuously fine-tuning the avg _ pre and avg _ end according to the test result, such as +1 or-1 at the same time, and the like to optimize the set to finally obtain a value with a better effect.
Step A4, after the positive and negative samples are divided, generating a sample set by adopting an incremental updating method; the step is the same as the step of the step A3d, where different characteristics need to be incrementally updated according to the "valid time range" to which the data characteristics belong in the step 1, and it needs to be noted that due to the fact that regular samples obtained by applying time period inference to the hospital-infected patient are continuous in time, the method can solve most of data loss problems, however, due to the fact that random sampling is adopted to the non-hospital-infected patient, it is difficult to guarantee certain continuity in time, and the "incremental updating" here does not necessarily solve the problem of data loss, and for this situation, it is necessary to deal with the situation according to the actual situation, and if the loss value is too much, it is considered that random continuous sampling is selected for several days when the non-hospital-infected patient sample is selected; fig. 3 shows a method for processing a sample missing value by using an incremental update method, which includes the following specific steps:
step A4a, designating the group of all patients in the aforementioned step A3 as S, m being the mth patient in S;
step A4a, traversing the set S to obtain a certain hospital infected patient m in the set S, performing 'time period reasoning' on the m to generate a positive and negative sample set N, and sorting the N according to the ascending order of the day and the date, wherein the sorting aims to ensure that the time is arranged from small to large when the incremental updating is performed, thereby ensuring that a new value always covers an old value when the updating is performed, and if the patient m is a non-infected patient, generating the sample set N by adopting a random sampling method;
step A4b, beginning to traverse a sample set N, wherein a first sample i is a sample with the minimum time, is directly stored in a sample set D, classifies the characteristics of the sample i into a set T, and records Tk _ v and Tk _ date which represent the value of the kth characteristic of the sample i and the date of the value;
step A4c, beginning to traverse the second and all the following samples i, judging the value Tk _ v of each feature Tk in i, if the value is a missing value, performing the step 5, otherwise, retaining the value and not performing any processing;
step A4D, if the value Tk _ v of the feature Tk of the sample i is a missing value, finding the value Tk _ v and Tk _ date of the feature Tk in the sample set D in reverse order, if the value in D is not empty and the difference between Tk _ date and Tk _ date in i does not exceed "valid time range", taking out and updating the value into Tk _ v of the sample i to replace the missing value, if the value in D is not empty but exceeds "valid time range", then pushing out the missing state of traversing and maintaining the kth feature of the sample i, and if the value in D is also empty, continuing to traverse the next value. The reverse-order traversal is required here to ensure that the traversed samples in set D are always closest in time to the current sample;
step A4e, after updating or reserving, storing the sample into the sample set D and reading the next sample, that is, i is i + 1;
and step A4f, judging whether the i is satisfied or not, if so, completing traversal, completing the construction of the sample set D, and if not, continuing the next step.
Step A5, analyzing, modeling, testing and optimizing the final sample set; this step is the same as the aforementioned step A3 e; after the sample set is generated, the most main difficulties of hospital infection data are basically solved, when analysis modeling is carried out, follow-up work can be completed basically according to the basic processes of data analysis and machine learning, however, it needs to be noted that the selection of a machine learning algorithm is not optional, and the early warning result of the hospital infection monitoring early warning model generally needs to have interpretability, namely rational data, so that the algorithm must select an algorithm with interpretative properties, such as a decision tree, a random forest, a logistic regression and the like, and the algorithms of deep learning, a support vector machine and the like are not suggested to be used; the modeling and testing process is shown in fig. 3, which still uses conventional algorithms and steps, as follows:
a5a, modeling a sample set D, suggesting and selecting algorithms such as a decision tree, a random forest, a logistic regression and the like, wherein the algorithms have interpretability, and recording sensitivity and specificity indexes of the algorithms on a test set;
step A5b, after recording sensitivity and specificity indexes, fine-tuning avg _ pre and avg _ end, modeling and testing again, and recording two indexes;
step A5c, modeling tests are carried out for multiple times, and two indexes with the best effect are found, wherein the avg _ pre and the avg _ end are basically the best values;
after the final model is constructed, the model can be integrated online, and the model has great difference according to different systems, but has universality basically.
The invention also provides an analysis modeling system for solving the problem of hospital infection data loss based on the incremental updating method, which at least comprises a database, wherein the database stores all patient sets S and case data of the patients in the sets S; a sample generating module, which generates a sample set according to the sample generating conditions, such as generating an infected patient set and a non-infected patient set according to the infection condition of the patient; the sample dividing module is used for dividing the sample set generated by the sample generating module into a sample set required by analysis and modeling; and a data updating module, wherein the data updating module realizes the updating of the missing data value through the step A4 a-the step A4 f.
An implementation method of an analysis modeling system for solving hospital infection data loss based on an incremental updating method is shown in fig. 4, and includes the following steps:
step B1, according to the information of the database, the patient data items needed in the hospital infection data are sorted and defined and a corresponding XML storage structure is designed;
b2, the sample generating module arranges the patient data into the sample format of the needed data according to the set sampling period and the data item as the characteristic, and generates the needed sample set;
in the step B2, the data of nosocomial infections are arranged into samples, each of which is the data of a patient in a set sampling period, and the features in the samples are incrementally updated according to the incremental updating method described above, so as to finally generate a sample set consisting of a plurality of samples of patients in the set sampling period.
Step B3, the sample dividing module divides the sample set according to the finally classified labels to generate a sample set after the final infection sample and the non-infection sample are distinguished;
step B4, the divided sample set is updated incrementally through a data updating module;
and step B5, after the sample set is updated, establishing a model according to a general modeling method.
Further, in step B1, the file is stored in an XML manner, and the file includes basic information of the patient, such as case number, sex, age, infection date, etc., basic information of the patient's admission, such as admission diagnosis, admission department, admission date, etc., and information of the patient setting a sampling period during the admission, such as body temperature, medical order, laboratory examination, microbiological examination, imaging examination, and disease course record, etc.; the storage scheme has the function of storing the information of the patient, is mainly convenient for organizing and applying the data, each item in the XML can be independently taken out and combined with other items for use, each item in the XML has accurate time, and can be organized according to the time sequence, and the use mode depends on the requirements of developers.
A computer-readable medium for incremental data update and hospital infection data analysis and modeling over a computer network, comprising a set of instructions that, when executed, cause at least one computer to perform solving a problem of incremental updates in a hospital infection data analysis modeling process and utilizing the updated data analysis and modeling.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.
Claims (10)
1. An incremental updating method for hospital infection data loss is characterized by comprising the following steps:
step 1, determining the characteristics of hospital infection data, wherein a characteristic set is marked as F, and k represents the kth characteristic in the set F;
each feature k has a valid time range V corresponding theretokDenotes that only at T-V is considered at the detection time TMkFeature F in the time period to TMkAs valid data for patient/patient testing; for high timeliness data such as body temperature, stool frequency, heart rate and respiratory frequency, V is setk24 hours atData before TM-24 hours are not considered for use; for low timeliness data such as microbiological examination and laboratory examination, set Vk72 hours or Vk120 hours; effective time range VkThe setting of the method refers to clinical experience and relevant references and is determined according to actual modeling purposes, and the reference standards comprise action time of partial characteristics in hospital infection diagnosis standards (trial);
in the determined effective time range VkThen, for VkThe same features are grouped together;
step 2, recording a set composed of all patients as S, obtaining a patient m in the set S, and generating a positive and negative sample set N for the patient m;
step 3, sequencing the positive and negative sample sets N in the step 2 in an ascending order according to a sequence from front to back;
step 4, storing the sample i with the earliest time in the sample set N into a sample set D, correspondingly storing the sample i into a set T according to the characteristics of the hospital infection data determined in the step 1, and respectively recording the Tk_vAnd Tk_dateIndicating the value of the kth characteristic in the set T corresponding to the sample i in the set N and the date of the value;
step 5, carrying out missing value judgment on the second and all the subsequent samples i in the sample set N, updating the missing values, and reserving the un-missing values;
step 6, storing the updated or reserved samples into a sample set D, reading subsequent samples according to the sequence of the step 5 and storing sample data;
step 7, when the step 5 and the step 6 are repeated to obtain that i is equal to N, the reading is completed, and the construction of the sample set D is completed;
wherein, the positive sample in step 2 is the m-sample of the patient with the hospital infection, and the negative sample is the m-sample of the patient without the hospital infection.
2. The method of claim 1, wherein if m is the patient in the positive case, m is the mth patient in S; if m is the patient in the counterexample sample, then m is the randomly drawn patient.
3. The incremental updating method for hospital infection data loss according to claim 1, wherein after the positive and negative example sample set N is generated in step 2, the samples in the infection state are divided for the positive and negative example sample N by adopting a time period reasoning manner, and the specific steps include:
step 2a, recording a set of hospital infected patients as C, and recording a set of diagnosis dates of the infected patients as Cd;
step 2b, randomly extracting n patients from the set C and obtaining diagnosis dates corresponding to the n patients;
step 2c, diagnosing the n patients in step 2b, and obtaining n pre-patient time periods TRbeforeAnd a post period TRafterArrays A _ pre and A _ end composed of the data of (1);
step 2d, summing the two arrays in step 2c, and then averaging to obtain two average values avg _ pre ═ sum (a _ pre)/n and avg _ end ═ sum (a _ end)/n; these two averages serve as two parameters for time period inference for all patients in set C, approximating the "previous time period" and "subsequent time period" for all patients in set C;
step 2e, fine tuning the avg _ pre and avg _ end through testing to obtain a final required value;
the preceding period TRbeforeSelecting an infection sample by taking the diagnosis date as a reference date, and taking the time unit length of the previous inference as a previous time period; the post period TRafterSelecting an infection sample by taking a diagnosis date as a reference date, and then reasoning the time unit length as a later time period; for example, for a patient in set C, it is at diagnosis time TdiagnoseThe time corresponding to the earliest characteristic is Tdate_beforeAt the diagnosis time TdiagnoseThe time corresponding to the latest feature is Tdate_afterThen TRbefore=Tdiagnose-Tdate_before,TRdate_after=Tdate_after-Tdiagnose。
4. The method of claim 1, wherein in step 5, if the value T of the characteristic Tk of the sample i is smaller than the threshold value T, the incremental updating method for hospital infection data loss is performedk_vIf the value is missing, the value T of the feature Tk is found in the sample set D in reverse orderk_vAnd Tk_dateIf T in the sample set Dk_vAnd Tk_dateNot empty, and Tk_dateAnd T in ik_dateDoes not exceed the "effective time range", T is determinedk_vTake T updated to sample ik_vInstead of the missing value.
5. The method of claim 1, wherein in step 5, if the value T of the characteristic Tk of the sample i is smaller than the threshold value T, the incremental updating method for hospital infection data loss is performedk_vIf the missing value is found, the value T of the feature Tk is found in the sample set D in reverse orderk_vAnd Tk_dateIf T in the sample set Dk_vAnd Tk_dateNot empty, but beyond the "valid time horizon", the missing state is deduced for the kth feature of the traversal holding sample i.
6. The method of claim 1, wherein in step 5, if the value T of the characteristic Tk of the sample i is smaller than the threshold value T, the incremental updating method for hospital infection data loss is performedk_vIf the missing value is found, the value T of the feature Tk is found in the sample set D in reverse orderk_vAnd Tk_dateIf T in the sample set Dk_vAnd Tk_dateNull, then continue traversing the next value.
7. An analytical modeling method for addressing hospital infection data loss by the incremental update method of claim 3, comprising the steps of:
step A1, determining the characteristics of hospital infection data, and classifying the characteristics according to an effective time range;
step A2, determining patients generating positive and negative samples, wherein the positive sample is a patient sample with nosocomial infection, and the negative sample is a patient sample without nosocomial infection;
step A3, dividing positive and negative examples by adopting a time period reasoning mode, wherein the specific implementation mode is as the steps 2 a-2 e;
step A4, generating a sample set by adopting an incremental updating method, wherein the specific implementation manner is as the steps 1 to 7;
step a5, analytically modeling the final sample set.
8. An analytical modeling system for solving hospital infection data loss by the incremental update method according to any one of claims 1-6, comprising at least a database in which case data of all patients in set S and patients in set S are stored; a sample generating module, which generates a sample set according to the sample generating conditions, such as generating an infected patient set and a non-infected patient set according to the infection condition of the patient; the sample dividing module is used for dividing the sample set generated by the sample generating module into a sample set required by analysis modeling; and the data updating module realizes the updating of the missing data value through the steps 1 to 7.
9. A method of implementing an analytical modeling system that addresses hospital infection data loss through the incremental update method of claim 8, comprising the steps of:
step B1, according to the information of the database, the patient data items needed in the hospital infection data are sorted and defined and a corresponding XML storage structure is designed;
b2, the sample generating module arranges the patient data into the sample format of the needed data according to the set sampling period and the data item as the characteristic, and generates the needed sample set;
in the step B2, the hospital infection data is arranged into samples, each of which is the data of one patient in the set sampling period, and the features in the samples are incrementally updated according to the incremental updating method described above, so as to finally generate a sample set composed of a plurality of samples of patients in the set sampling period;
step B3, the sample dividing module divides the sample set according to the finally classified labels to generate a sample set after the final infection sample and the non-infection sample are distinguished;
step B4, the divided sample set is updated incrementally through a data updating module;
and step B5, after the sample set is updated, establishing a model according to a general modeling method.
10. A computer-readable medium for incremental data update and hospital infection data analysis and modeling over a computer network, comprising a set of instructions that, when executed, cause at least one computer to perform the steps of solving the problem of incremental update in a hospital infection data analysis modeling process as claimed in any one of claims 1-6 and utilizing the updated data analysis and modeling.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811129962 | 2018-09-27 | ||
CN2018111299621 | 2018-09-27 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109461480A CN109461480A (en) | 2019-03-12 |
CN109461480B true CN109461480B (en) | 2022-06-14 |
Family
ID=65607852
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811219378.5A Active CN109461480B (en) | 2018-09-27 | 2018-10-19 | Incremental updating method for hospital infection data loss |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109461480B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112002383B (en) * | 2020-06-30 | 2024-03-08 | 杭州杏林信息科技有限公司 | Automatic management method and system for number of people in hospital infection state in specific period |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009029599A1 (en) * | 2007-08-25 | 2009-03-05 | Quantum Leap Research, Inc. | A scalable, computationally efficient and rapid simulation suited to decision support, analysis and planning |
CN104750830A (en) * | 2015-04-01 | 2015-07-01 | 东南大学 | Cycle Mining Method for Time Series Data |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150363568A1 (en) * | 2014-06-17 | 2015-12-17 | RightCare Solutions, Inc. | Systems and methods for assessing patient readmission risk and selecting post-acute care intervention |
-
2018
- 2018-10-19 CN CN201811219378.5A patent/CN109461480B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009029599A1 (en) * | 2007-08-25 | 2009-03-05 | Quantum Leap Research, Inc. | A scalable, computationally efficient and rapid simulation suited to decision support, analysis and planning |
CN104750830A (en) * | 2015-04-01 | 2015-07-01 | 东南大学 | Cycle Mining Method for Time Series Data |
Non-Patent Citations (2)
Title |
---|
Time series analysis comparing mandatory andvoluntary notification of newly diagnosed HIVinfections in a city with a concentrated epidemic;Juliana M Reyes-Uruena 等;《BMC Public Health》;20130412;第1-8页 * |
疾病传播输入输出流的时空特征分析——以北京SARS流行为例;胡碧松 等;《中国科学:地球科学》;20130920;第43卷(第9期);第1499-1517页 * |
Also Published As
Publication number | Publication date |
---|---|
CN109461480A (en) | 2019-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109360657B (en) | Time period reasoning method for selecting samples of hospital infection data | |
Peimankar et al. | DENS-ECG: A deep learning approach for ECG signal delineation | |
US20200337580A1 (en) | Time series data learning and analysis method using artificial intelligence | |
US20100217144A1 (en) | Diagnostic and predictive system and methodology using multiple parameter electrocardiography superscores | |
Shi et al. | Inter-patient heartbeat classification based on region feature extraction and ensemble classifier | |
Kim et al. | Arrhythmia detection model using modified DenseNet for comprehensible Grad-CAM visualization | |
CN110680326A (en) | Pneumoconiosis identification and grading judgment method based on deep convolutional neural network | |
CN113995419B (en) | Atrial fibrillation risk prediction system based on heartbeat rhythm signal and application thereof | |
CN107348964B (en) | Psychological load measurement method of drivers in extra-long tunnel environment based on factor analysis | |
CN113723535A (en) | CycleGAN deep learning-based cell micronucleus image processing method and storage medium | |
CN113876331B (en) | Electrocardiosignal-oriented semi-supervised atrial fibrillation automatic detection system | |
CN117831771B (en) | Disease risk prediction model construction method and system based on deep learning | |
CN115221926A (en) | Heart beat signal classification method based on CNN-GRU network model | |
CN109461480B (en) | Incremental updating method for hospital infection data loss | |
CN116740426A (en) | A classification and prediction system for functional magnetic resonance images | |
US20030191732A1 (en) | Online learning method in a decision system | |
CN114191665A (en) | Classification method and classification device for human-machine asynchrony during mechanical ventilation | |
CN113539473A (en) | Method and system for diagnosing brucellosis only by using blood routine test data | |
CN115607166B (en) | A method and system for intelligent analysis of ECG signals, and an intelligent ECG auxiliary system | |
Seo et al. | A Deep Neural Network Based Wake-After-Sleep-Onset Time Aware Sleep Apnea Severity Estimation Scheme Using Single-Lead ECG Data | |
Ataer-Cansizoglu et al. | Observer and feature analysis on diagnosis of retinopathy of prematurity | |
CN114038554A (en) | An auxiliary diagnosis system for tuberculous pleural effusion based on machine learning algorithm | |
CN210776809U (en) | Inspection image quality evaluation system, equipment and storage medium | |
Rahman et al. | Quantifying uncertainty of a deep learning model for atrial fibrillation detection from ECG signals | |
Lin et al. | Algorithm for clustering analysis of ECG data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |