CN110827945B - Control method for generating key factors based on medical data - Google Patents
Control method for generating key factors based on medical data Download PDFInfo
- Publication number
- CN110827945B CN110827945B CN201810952497.5A CN201810952497A CN110827945B CN 110827945 B CN110827945 B CN 110827945B CN 201810952497 A CN201810952497 A CN 201810952497A CN 110827945 B CN110827945 B CN 110827945B
- Authority
- CN
- China
- Prior art keywords
- variable
- key
- data
- semantic
- variables
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000013499 data model Methods 0.000 claims abstract description 22
- 230000004927 fusion Effects 0.000 claims description 30
- 238000012937 correction Methods 0.000 claims description 14
- 238000005314 correlation function Methods 0.000 claims description 8
- 238000012216 screening Methods 0.000 claims description 5
- 238000005065 mining Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 16
- 201000010099 disease Diseases 0.000 description 13
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 13
- 239000003814 drug Substances 0.000 description 6
- 239000000126 substance Substances 0.000 description 4
- 208000024891 symptom Diseases 0.000 description 4
- 235000005911 diet Nutrition 0.000 description 3
- 229940079593 drug Drugs 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000036772 blood pressure Effects 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 230000000378 dietary effect Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 206010020772 Hypertension Diseases 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 230000036760 body temperature Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000002059 diagnostic imaging Methods 0.000 description 1
- 238000002405 diagnostic procedure Methods 0.000 description 1
- 230000037213 diet Effects 0.000 description 1
- 238000002565 electrocardiography Methods 0.000 description 1
- 238000000537 electroencephalography Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007721 medicinal effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H15/00—ICT specially adapted for medical reports, e.g. generation or transmission thereof
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention provides a control method for generating key factors based on medical data, which comprises the following steps: a. defining a data modelWhere M denotes a variable set composed of a plurality of key variables, M ═ M1,m2,…mn}; r represents a semantic relationship set among a plurality of key variables, R ═ R1,r2,…rn};An association function representing semantic relationship correspondence between the key variables, ri∈R,<mq,mp>∈M×M,mqdenotes a variable of origin, mpRepresenting an endpoint variable; b. defining a data set D ═ D consisting of a plurality of basic medical data1,d2,…dnIf m isi=mod(di) Wherein m isi∈M,diE is as for D; then diNamely the key factor. The invention is achieved byThe algorithm generates key factors based on the basic medical data and provides basis for the standardized processing of the medical big data.
Description
Technical Field
The invention relates to the field of big data processing, in particular to a method for processing marking points of medical big data, and specifically relates to a control method for generating key factors based on medical data.
Background
With the advent of the big data age, a variety of different types of data were collected and processed, medical data being the most specific type of data that contained a wide variety of variables including patient data, physician data, disease data, symptom data, test data, diagnostic data, treatment data, drug data, and the like. Starting from the medical activity itself, the biggest difference is that there is a clear logical relationship between medical behaviors, and therefore there is a clear semantic relationship between medical data, and at the same time, medical data is usually input or generated by a doctor or a patient, and the semantic relationship between the data can reflect the decision of the doctor and the correlation between the corresponding treatment method and the disease development condition.
When a doctor learns medical knowledge or needs to make a diagnosis decision, if a standardized database can be built, the database can facilitate the doctor to make a decision quickly, and the database has very important reference value and guiding significance. In the prior art, data processing is mainly focused on data of life behavior information data of common people, and a generally adopted mode is weighted fusion, but obviously, the method cannot be applied to processing medical data.
With the development of technology, the processing of medical data is also studied, but the processing of medical data is still directed to comprehensively processing specific types of medical data and creating a database, so that an isolated type of database is not highly referenced for doctors, and doctors still need to consume a large amount of labor to integrate different types of databases when actually applying the database, thereby guiding the actual work.
The generation of the marking points is particularly important if batch processing of medical data is to be realized.
Disclosure of Invention
The technical problem solved by the technical scheme of the invention is to provide accurate marking points for batch processing of medical data and realize quick and accurate processing of medical big data.
In order to solve the technical problem, the technical scheme of the invention is a control method for generating key factors based on medical data, which comprises the following steps:
a. defining the data model mod ═<M,R,φ>Wherein M represents a variable set composed of a plurality of said key variables, M ═ M1,m2,…mn}; r represents a semantic set of relationships between a plurality of said key variables, R ═ R1,r2,…rn}; phi represents a correlation function corresponding to the semantic relationship between the key variables,ri∈R,<mq,mp>∈M×M,mqdenotes a variable of origin, mpRepresenting an endpoint variable;
b. defining a data set D ═ D composed of a plurality of said base medical data1,d2,…dnIf m isi=mod(di) Wherein m isi∈M,diE is as for D; then diNamely the key factor.
Preferably, the step a is followed by the steps of:
a1. constructing a multivariate index map;
a2. performing a fusion step on the set of variables M based on the multivariate index map.
Preferably, the step a2 is followed by the following steps:
a3. performing a fusion step on the set of semantic relationships R based on the multivariate index map.
Preferably, the step a1 includes the following steps:
i. extracting the characteristics of each key variable one by one and establishing a univariate index one by one based on the characteristics;
establishing edges corresponding to semantic relations among the single variable indexes based on the semantic relations among the key variables;
pairwise mining association rules among the single variable indexes with semantic relations, and establishing edges corresponding to the association rules;
constructing the multivariate index mapWherein V (G)C) Is a set of all said features corresponding to all said key variables, E (G)C) Is a set of edges corresponding to all the semantic relations and edges corresponding to all the association rules,is a function corresponding to the association rule between the univariate indexes.
Preferably, the step ii comprises the steps of:
if the two key variables have a semantic relationship, judging that the two univariate indexes corresponding to the two key variables have the semantic relationship;
and iI2, sequentially connecting the two univariate indexes of the existing semantic relationship.
Preferably, in step iii, the association rule is mined by:
iiiI 1, constructing a plurality of feature chains based on a plurality of features corresponding to a plurality of key variables with semantic relationshipThe characteristic chainSatisfies the following conditions: any two adjacent features contained in each feature chain have a semantic relationship, wherein,in order to be a starting point characteristic,is an endpoint feature, C refers to a set of features that includes all of the features;
iiiI 2. calculate a plurality of feature chainsDefining the minimum probability as the minimum support degree and the minimum conditional probability as the minimum confidence degree of the probability and the conditional probability of each feature chain;
iiiI 3. if impliedIf the minimum support degree and the minimum confidence degree are simultaneously satisfied, the implication expression Is based onBuilt univariate index and baseAnd establishing association rules of the univariate indexes.
Preferably, the step a2 includes the following steps:
a2. based on the multivariate index mapEstablishing independent feature set C 'corresponding to different variables'POf the independent feature set C'PThe following conditions are satisfied: absence of E ∈ E (G)c) So thatWherein C isi∈C′P,type(e)=0,Cj∈V(GC);
a2. Calculated according to the following formulaVariable mpVariable fusion weight w ofp,Wherein m ispE M, x represents the independent feature set C'PThe number of features involved;
performing a fusion step on the set of variables M based on the variable fusion weights.
Preferably, the step a3 is followed by the following steps:
a3. obtaining all and mpThe association variable set M 'with semantic relation exists, and the variable set M' meets the following conditions: for any miAll are rpiE R, such thatOrWherein M' belongs to M, rpiIs mpCorresponding semantic relation, and the fusion weight corresponding to the variable in the variable set M' is wi;
a4. Calculating the semantic relation r according to the following formulapiSemantic relationship fusion weight of Wherein y represents the number of variables contained in the associated variable set M';
a3. and executing a fusion step on the semantic relation set R based on the semantic relation fusion weight.
Preferably, the order of all key variables in the variable set M is randomly rearranged.
Preferably, the step b is followed by the steps of:
c. acquiring the basic medical data corresponding to a plurality of source terminals, wherein the source terminals are selected in a manual screening mode;
d. screening at least one correction factor from the basic medical data corresponding to the plurality of source terminals based on the data model, wherein the correction factor and the key factor have the same data structure;
e. a correction step is performed on the key factor based on the correction factor.
The invention generates key factors through a specific algorithm and provides a basis for the standardized processing of medical big data.
The technical scheme of the invention also corrects the key factor by introducing the source terminal.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a control method for generating key factors based on medical data, according to an embodiment of the present invention;
FIG. 2 is a flow chart of another key factor generation according to the first embodiment of the present invention;
FIG. 3 is a flow chart of another key factor generation according to the second embodiment of the present invention;
fig. 4 shows a control method for generating key factors based on medical data according to a third embodiment of the present invention.
Detailed Description
In order to better and clearly show the technical scheme of the invention, the invention is further described with reference to the attached drawings.
The skilled person understands that medical data typically comes from a user terminal, which may be understood as a terminal device that may collect data by manual input or by connection with different detection devices. For example, the data can be input by a manual input or automatic recognition by taking a picture, or the data can be acquired in real time through an open port by a computer sharing data with a physical sign sensor or a medical detection device. More specifically, the underlying medical data is data associated with an individual, which can be understood from multiple dimensions: from the generation channel of medical data, the basic medical data can be mainly divided into doctor-side data and patient-side data, the doctor-side data comprises outpatient and emergency records, hospitalization records, image records, laboratory records, medication records, operation records, follow-up records and the like, and the patient-side data comprises personal living habits, living environments, family heredity, family environments and the like. From the composition structure of the basic medical data, the basic medical data can be divided into: (1) measurement values generated by the examination means, such as body temperature, blood pressure, blood oxygen saturation, assay values, and the like; (2) signals recorded by the instrument, such as electrocardiography, electroencephalography, and the like; (3) images generated by medical imaging equipment, such as X-ray images, CT images, MRI images and the like; (4) report results presented in text form, such as explanations given by doctors in combination with their own medical knowledge for measurement values, signals, images, pathological diagnoses made by doctors, and the like; (5) narrative data, such as complaints recorded by a physician (patient-dictated illness), patient history; (6) metadata text, such as knowledge about organs, drugs, diseases, and treatment methods, parameters of medical devices, and the like; (7) social characteristics such as institution information of hospitals, personal information of doctors and patients, and the like. Although the structures and the contained semantics of the different types of basic medical data are different, the different types of basic medical data can mutually prove and complement each other, and all express the content and the characteristics of medical information from a specific angle to form a diverse and complementary data set.
As a specific embodiment of the present invention, fig. 1 shows a flowchart for generating a key factor, which specifically includes the following steps:
first, step S101 is executed to define the data model mod<M,R,φ>Wherein M represents a variable set composed of a plurality of said key variables, M ═ M1,m2,…mn}; r represents a semantic set of relationships between a plurality of said key variables, R ═ R1,r2,…rn}; phi represents a correlation function corresponding to the semantic relationship between the key variables,ri∈R,<mq,mp>∈M×M,mqdenotes a variable of origin, mpThe endpoint variable is indicated. Specifically, the key variables may refer to only data structures, such as text data, audio, image, and video data, which are defined as different key variables due to different data structures; for example, not only the data structure but also the generation method of data, for example, CT data and MRI data belong to image data, but the generation method may be further divided into a plurality of variables. Those skilled in the art understand that the above explanation of the key variables also considers the description in step S101, and is not repeated herein. More specifically, semantic relationships reflect the dependency between different key variables, which are directional, so there are a start variable and an end variable. The correlation function represents a logical algorithm in which the dependency exists.
Further, the key variables may be defined manually or derived from an existing public database, preferably, the key variables are derived by taking the disease category as a standard and comprehensively considering multidimensional type data related to each disease, for example, taking hypertension as an example, various types of data such as sex, age, blood pressure value, dietary structure, familial heredity, exercise amount and the like may be defined as key variables, and accordingly, other types of diseases may be derived in a similar manner. More specifically, the data types used for defining the key variables are also various, and may include purely digital data (such as physical examination results and physical sign parameters), waveform signals (such as electrocardiosignals and electroencephalogram signals), images (such as images generated by medical instruments such as MRI and CT), texts (such as information of patients, descriptions of symptoms and texts of diagnosis results), and the like.
Those skilled in the art understand that, for the key variables of the medical industry, it is common to generate a series of variables around a patient in units of the patient, or alternatively, generate a series of variables in units of a study or a disease, so that the key variables related to medicine are semantically related, unlike the general big data in the life field. Specifically, the plurality of key variables may be obtained by classifying medical data according to methods such as a common SVM, a BP neural network, and a decision tree, the data model is preferably set based on the key variables according to different disease types, and the data model may be understood as a logical data model, which may be used as a data model supported by a specific database management system and mainly includes three types, i.e., a mesh data model, a hierarchical data model, and a relational data model. The data model is both user-oriented and system-oriented, and is mainly used for realizing a database management system. Data models are used in databases to abstract, represent and process data and information in the real world, mainly to study the logical structure of the data.
Further, the key factor is processed according to the processing principle of the basic medical data and the medical logic, for example, the basic medical data includes a plurality of different semantics after being analyzed, accordingly, the key factor obtains a plurality of nodes which are indispensable in the logic relationship from the interrelation of the plurality of semantics, and the key factor can be obtained based on the plurality of nodes. For example, a series of basic medical data including age, sex, place of life, dietary structure, medicine taken, symptom, chemical examination index, image data, disease, doctor for seeing a doctor and the like is formed after Zhang san of a patient visits a certain hospital in Yinchuan city in Ningxia, and accordingly, the age, symptom, chemical examination index and image data are used as a plurality of nodes, the basic medical data of other users with the same disease as Zhang san are selected, data values of the plurality of nodes can be obtained through statistics, and a key factor is obtained through calculation according to the data values of the plurality of nodes; as a variation, other basic medical data of the doctor who has the same doctor with zhang san can be selected to obtain data values of a plurality of nodes through statistics, and a key factor is calculated according to the data values of the plurality of nodes; as another variation, the diet and the medicine can be added as nodes, and the range of key factors is expanded.
In a specific application example, a key variable m representing patient information can be established1Showing doctor informationCritical variable m1Represents a disease information key variable m3A key variable m representing a patient examination index4Is used for representing the chemical examination index data of the patient and representing the key variable m of the CT image of the patient5Key variable m representing the medical gist6And establishing a semantic relation set containing the key variables based on the key variables. Wherein m is1And m3Is the semantic relationship of the patient and the disease; m is1And m3The semantic relationship between doctors and diseases; m is4And m1The semantic relation of the patients is described by chemical examination indexes; m is a unit of4And m1The semantic relation between the two indexes is made by doctors; m is a unit of6And m1The semantic relationship between the medical points given by the doctor is shown in the specification; m is5And m1The images describe the semantic relation of the patient; m is6And m5There is a semantic relationship of the diagnostic method. Accordingly, different semantic relationships have different association functions to be determined.
Further, step S101 is executed to define a data set D ═ D composed of a plurality of the basic medical data1,d2,…dnIf m isi=mod(di) Wherein m isi∈M,diE is as for D; then diNamely the key factor. Specifically, a database composed of the basic medical data is processed according to the data model to obtain a series of data sets with logical relations, and then the contact ratio of the data sets and a variable set composed of key variables is judged to obtain the key factors.
As a first embodiment of the present invention, fig. 2 shows a flowchart for generating another key factor, which specifically includes the following steps:
first, step S201 is executed to define the data model mod<M,R,φ>Wherein, M represents a variable set composed of a plurality of key variables, and M is { M ═ M1,m2,…mn}; r represents a semantic set of relationships between a plurality of said key variables, R ═ R1,r2,…rn}; phi represents the semantic relationship correspondence between the key variablesThe correlation function of (a) is selected,ri∈R,<mq,mp>∈M×M,mqdenotes a variable of origin, mpThe endpoint variable is indicated.
Further, step S202 is executed to construct a multivariate index map. Specifically, the multivariate index map is a database structure formed by modeling the basic medical data through the data model, and defines the boundary and the logical data layer structure of the whole database, and accordingly, the semantic relationship has constraint on the database, and the correlation function corresponding to the semantic relationship also has constraint on the database. Those skilled in the art understand that data redundancy is common to data comprising a plurality of different variables, and therefore it is necessary to check the semantic relationships and the constraints of the association functions implied by the semantic relationships. Preferably, in the modeling process, the modeling should be performed in a certain order, so that for data with dependency relationship, it can be ensured that the depended data is stored before the dependency data, and then a check can be made on whether the semantic relationship is correct and the uniqueness attribute is unique. The order may be generated by traversing semantic relationships between key variables in the variable set.
Further, step S203 is executed, and the fusion step is executed on the variable set M based on the multivariate index map. Those skilled in the art understand that there is a rule for associating some features in the feature set corresponding to a key variable with features of other key variables, that is, when a certain feature appears in the data of the key variable, the corresponding certain feature also appears in the data of the corresponding key variable with a high probability. This substantially reflects the correlation between data of different key variables, that is, the information expressed by the features with association rules can be calculated by information in other key variables, so that this part of information is repeated in two key variables, and when fusing multiple key variables, the similarity of multiple key variables needs to be considered, and when calculating the similarity of multiple key variable data, the information with a certain repeatability should be removed, and only the relatively opposite part of one key variable is considered. More specifically, the number of features contained in each key variable reflects the amount of information contained in the data in the key variable, and key variables containing more features have larger information amount and should be given higher weight when the key variables are fused. Correspondingly, the weight of each key variable can be calculated based on the multivariate index graph, and after the weights of the key variables are obtained, a variable set consisting of all the key variables can be fused based on the weights.
Further, step S204 is executed, and a fusion step is executed on the semantic relation set R based on the multivariate index map. The principle and implementation of this step are similar to step S203, and those skilled in the art can understand it in conjunction with step S203.
Further, step S205 is executed to define a data set D ═ D composed of a plurality of the basic medical data1,d2,…dnIf m isi=mod(di) Wherein m isi∈M,diE is as for D; then diNamely the key factor. Those skilled in the art will appreciate that the second embodiment differs from the first embodiment in that: the variable set and the semantic relation set are fused, and accordingly, the fused data model mod is optimized, and therefore the obtained key factors are accurate. In particular, the second embodiment is not generally suitable for use at the beginning of database formation, but is enabled after the database has accumulated to a certain extent, particularly for a brief start-up for a certain data peak period.
As a third embodiment of the present invention, fig. 3 shows another flowchart for generating key factors based on medical data, which specifically includes the following steps:
first, step S301 is executed to define the data model mod<M,R,φ>Wherein M represents a variable set composed of a plurality of said key variables, M ═ M1,m2,…mn}; r represents a semantic set of relationships between a plurality of said key variables, R ═ R1,r2,…rn}; phi represents a correlation function corresponding to the semantic relationship between the key variables,ri∈R,<mq,mp>∈M×M,mqdenotes a variable of origin, mpThe endpoint variable is indicated.
Further, step S302 is executed to extract the features of each of the key variables one by one and establish a univariate index one by one based on the features.
Further, step S303 is executed to establish an edge corresponding to the semantic relationship between the single variable indexes based on the semantic relationship between the multiple key variables.
Specifically, as an algorithm for preferably establishing the semantic relationship boundary, the following is specifically mentioned:
firstly, if the two key variables have semantic relation, judging that the two univariate indexes corresponding to the two key variables have semantic relation;
secondly, the two univariate indexes of the existing semantic relationship are communicated one by one.
Further, step S304 is executed, pairwise mining association rules between the plurality of univariate indexes having semantic relationships, and establishing edges corresponding to the plurality of association rules;
specifically, as an algorithm for preferably establishing the association rule boundary, the following is specifically mentioned:
firstly, a plurality of feature chains are constructed based on a plurality of features corresponding to a plurality of key variables with semantic relationsThe characteristic chainSatisfies the following conditions: any two adjacent features contained in each feature chain have a semantic relationship, wherein,in order to be a starting point characteristic,is an endpoint feature, C refers to a set of features that includes all of the features;
second, a plurality of feature chains are computedDefining the minimum probability as the minimum support degree and the minimum conditional probability as the minimum confidence degree of the probability and the conditional probability of each feature chain;
finally, if impliedIf the minimum support degree and the minimum confidence degree are simultaneously satisfied, the implication expressionIs based onBuilt univariate index and baseAnd establishing association rules of the univariate indexes.
Further, step S305 is executed to construct the multivariate index mapWherein V (G)C) Is a set of all said features corresponding to all said key variables, E (G)C) Is all atThe set of the edges corresponding to the semantic relation and the edges corresponding to all the association rules,is a function corresponding to the association rule between the univariate indexes.
Further, step S306 is executed, and the fusion step is executed on the variable set M based on the multivariate index map. In particular, a preferred algorithm is shown below:
first, based on the multivariate index mapEstablishing independent feature set C 'corresponding to different variables'POf the independent feature set C'PThe following conditions are satisfied: absence of E ∈ E (G)c) So thatWherein C isi∈C′P,type(e)=0,Cj∈V(GC);
Next, the variable m is calculated according to the following formulapVariable fusion weight w ofp,Wherein m ispE M, x represents the independent feature set C'PThe number of features involved;
finally, a fusion step is performed on the set of variables M based on the variable fusion weights.
Further, step S307 is executed, and a fusion step is executed on the semantic relation set R based on the multivariate index map. In particular, a preferred algorithm is shown below:
first, all and m are obtainedpThe association variable set M' with semantic relation satisfies the following conditions: for any miAll are rpiE R, such thatOrWherein M' belongs to M, rpiIs mpCorresponding semantic relation, and the fusion weight corresponding to the variable in the variable set M' is wi;
Next, the semantic relation r is calculated according to the following formulapiSemantic relationship fusion weight of Wherein y represents the number of variables contained in the associated variable set M';
and finally, executing a fusion step on the semantic relation set R based on the semantic relation fusion weight.
Further, step S308 is executed to define a data set D ═ D composed of a plurality of the basic medical data1,d2,…dnIf m isi=mod(di) Wherein m isi∈M,diE is as for D; then diNamely the key factor.
Those skilled in the art will appreciate that the second embodiment differs from the first embodiment in that the third embodiment shows a specific algorithm for generating key factors based on a feature fusion method, which is more easily applied in a practical process.
As a variation of the first and second embodiments, the order of all the key variables in the variable set M is randomly rearranged.
As a third embodiment of the present invention, fig. 4 shows a control method for generating a key factor based on medical data, comprising the steps of:
first, step S401 is executed to define the data model mod ═<M,R,φ>Wherein M represents a variable set composed of a plurality of said key variables, M ═ M1,m2,…mn}; r representsA set of semantic relationships between a plurality of said key variables, R ═ R1,r2,…rn}; phi represents a correlation function corresponding to the semantic relationship between the key variables,ri∈R,<mq,mp>∈M×M,mqdenotes a variable of origin, mpRepresenting an endpoint variable;
further, step S402 is executed to define a data set D ═ D composed of a plurality of the basic medical data1,d2,…dnIf m isi=mod(di) Wherein m isi∈M,diE is as for D; then diNamely the key factor;
those skilled in the art will understand that the steps S401 and S402 can be combined with the descriptions and variations of the specific implementation manner, the first embodiment, and the second embodiment of the present invention, and are not described herein again.
Further, step S403 is executed to acquire the basic medical data corresponding to a plurality of source terminals, and the source terminals are selected in a manual screening manner. Those skilled in the art understand that, when the method is executed in this step, in order to improve the accuracy of the key factor, the basic medical data of the source terminal needs to be called to modify the key factor according to the foregoing step. Specifically, the source terminal is manually screened, that is, in a conventional medical big data processing project, a representative user terminal appearing in the processing process is manually marked, and then the corresponding user terminal is used as the source terminal. In a typical application scenario, in a daily data processing process, a certain user terminal feeds back an objection to a result processed by a system, and the objection condition of the user terminal is manually checked and marked.
Further, step S404 is executed to screen out at least one correction factor from the basic medical data corresponding to the plurality of source terminals based on the data model. Specifically, the implementation manner of this step may refer to the specific implementation manner shown in fig. 1 and a plurality of following embodiments, which are not described herein again. More specifically, the difference in generating the correction factor in this step is that the source of the basic medical data for generating the key factor shown in fig. 1 is not limited, but the basic medical data for generating the correction factor is limited to the source terminal.
Further, step S405 is executed, and a correction step is executed on the key factor based on the correction factor. Specifically, the correction step may be a direct substitution, or may be adjusted using a conventional data fusion method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.
Claims (7)
1. A control method for generating key factors based on medical data is characterized by comprising the following steps:
a. define data model mod ═<M,R,φ>Where M denotes a variable set composed of a plurality of key variables, M ═ M1,m2,…mnClassifying a plurality of key variables according to the medical data to obtain the key variables; r represents a semantic set of relationships between a plurality of said key variables, R ═ R1,r2,…rn}; phi represents a correlation function corresponding to the semantic relationship between the key variables,ri∈R,<mq,mp>∈M×M,mqdenotes a variable of origin, mpRepresenting an endpoint variable;
a1. constructing a multivariate index map;
i. extracting the characteristics of each key variable one by one and establishing a univariate index one by one based on the characteristics;
establishing edges corresponding to semantic relations among the single variable indexes based on the semantic relations among the key variables;
pairwise mining association rules among the single variable indexes with semantic relations, and establishing edges corresponding to the association rules;
constructing the multivariate index mapWherein V (G)C) Is a set of all said features corresponding to all said key variables, E (G)C) Is a set of edges corresponding to all the semantic relations and edges corresponding to all the association rules,is a function corresponding to the association rule between the univariate indexes;
a2. performing a fusion step on the set of variables M based on the multivariate index map;
a3. performing a fusion step on the semantic relation set R based on the multivariate index map;
b. defining a data set D ═ D consisting of a plurality of basic medical data1,d2,…dnIf m isi=mod(di) Wherein m isi∈M,diE is as for D; then diNamely the key factor.
2. The control method according to claim 1, wherein the step ii includes the steps of:
if the two key variables have a semantic relationship, judging that the two univariate indexes corresponding to the two key variables have the semantic relationship;
and iI2, sequentially connecting the two univariate indexes of the existing semantic relationship.
3. The control method according to claim 2, wherein in the step iii, the association rule is mined by:
iiiI 1. based on that there are multiple semantic relationships that key variables correspond toConstructing a plurality of feature chains from the featuresThe characteristic chainSatisfies the following conditions:m is not equal to n, m is not more than n, i is not equal to j, i is not more than m, j is not more than n, | i-j | is not less than 3, any two adjacent features contained in each feature chain have a semantic relationship, wherein,as a characteristic of the starting point, the method,is an endpoint feature, C refers to a set of features that includes all of the features;
iiiI 2. calculate a plurality of feature chainsDefining the minimum probability as the minimum support degree and the minimum conditional probability as the minimum confidence degree of the probability and the conditional probability of each feature chain;
4. The control method according to claim 3, wherein the step a2 includes the steps of:
a21. based on the multivariate index mapEstablishing independent feature set C 'corresponding to different variables'POf the independent feature set C'PThe following conditions are satisfied: absence of E ∈ E (G)c) So thatWherein C isi∈C′P,type(e)=0,Cj∈V(GC);
a22. The variable m is calculated according to the following formulapVariable fusion weight w ofp,Wherein m ispE M, x represents the independent feature set C'PThe number of features involved;
a23. performing a fusion step on the set of variables M based on the variable fusion weights.
5. The control method according to claim 4, characterized in that said step a23 is followed by the steps of:
a31. obtaining all and mpThe association variable set M 'with semantic relation exists, and the variable set M' meets the following conditions: for any miAll are rpiE R, such thatOrWherein M' belongs to M, rpiIs mpCorresponding semantic relation, and the fusion weight corresponding to the variable in the variable set M' is wi;
a32. Calculating the semantic relation r according to the following formulapiSemantic relationship fusion weight ofWherein y represents the number of variables contained in the associated variable set M';
a33. and executing a fusion step on the semantic relation set R based on the semantic relation fusion weight.
6. Control method according to any of claims 1 to 5, characterized in that the order of all key variables in the set of variables M is randomly rearranged.
7. The control method according to any one of claims 1 to 5, characterized in that the step b is followed by the step of:
c. acquiring the basic medical data corresponding to a plurality of source terminals, wherein the source terminals are selected in a manual screening mode;
d. screening at least one correction factor from the basic medical data corresponding to the plurality of source terminals based on the data model, wherein the correction factor and the key factor have the same data structure;
e. a correction step is performed on the key factor based on the correction factor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810952497.5A CN110827945B (en) | 2018-08-14 | 2018-08-14 | Control method for generating key factors based on medical data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810952497.5A CN110827945B (en) | 2018-08-14 | 2018-08-14 | Control method for generating key factors based on medical data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110827945A CN110827945A (en) | 2020-02-21 |
CN110827945B true CN110827945B (en) | 2022-05-27 |
Family
ID=69547461
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810952497.5A Active CN110827945B (en) | 2018-08-14 | 2018-08-14 | Control method for generating key factors based on medical data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110827945B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105184307A (en) * | 2015-07-27 | 2015-12-23 | 蚌埠医学院 | Medical field image semantic similarity matrix generation method |
CN107145511A (en) * | 2017-03-31 | 2017-09-08 | 上海森亿医疗科技有限公司 | Structured medical data library generating method and system based on medical science text message |
CN108319605A (en) * | 2017-01-16 | 2018-07-24 | 医渡云(北京)技术有限公司 | The structuring processing method and system of medical examination data |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160224637A1 (en) * | 2013-11-25 | 2016-08-04 | Ut Battelle, Llc | Processing associations in knowledge graphs |
US10593429B2 (en) * | 2016-09-28 | 2020-03-17 | International Business Machines Corporation | Cognitive building of medical condition base cartridges based on gradings of positional statements |
US10818394B2 (en) * | 2016-09-28 | 2020-10-27 | International Business Machines Corporation | Cognitive building of medical condition base cartridges for a medical system |
-
2018
- 2018-08-14 CN CN201810952497.5A patent/CN110827945B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105184307A (en) * | 2015-07-27 | 2015-12-23 | 蚌埠医学院 | Medical field image semantic similarity matrix generation method |
CN108319605A (en) * | 2017-01-16 | 2018-07-24 | 医渡云(北京)技术有限公司 | The structuring processing method and system of medical examination data |
CN107145511A (en) * | 2017-03-31 | 2017-09-08 | 上海森亿医疗科技有限公司 | Structured medical data library generating method and system based on medical science text message |
Non-Patent Citations (4)
Title |
---|
Semantic Health Knowledge Graph:Semantic Integration of Heterogeneous Medical Knowledge and Services;Longxiang Shi et al.;《BioMed Research International》;20170212;第2017卷;第1-12页 * |
一种基于本体语义驱动的开放生物医学数据集成方法;王凯 等;《湖北工程学院学报》;20171130;第37卷(第06期);第78-84页 * |
基于本体的生物医学数据多模态语义转换方法;王凯 等;《吉首大学学报(自然科学版)》;20170725;第38卷(第04期);第38-41页 * |
精准医学知识库的构建;刘雷 等;《中华医学图书情报杂志》;20180630;第27卷(第06期);第1-9页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110827945A (en) | 2020-02-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Spiegelhalter et al. | Statistical and knowledge‐based approaches to clinical decision‐support systems, with an application in gastroenterology | |
US20030120458A1 (en) | Patient data mining | |
JP7682270B2 (en) | Techniques for generating predictive outcomes regarding spinal muscular atrophy using artificial intelligence | |
Al-Mualemi et al. | A deep learning-based sepsis estimation scheme | |
US11568964B2 (en) | Smart synthesizer system | |
Pokharel et al. | Representing EHRs with temporal tree and sequential pattern mining for similarity computing | |
Zaman et al. | A review on the significance of body temperature interpretation for early infectious disease diagnosis | |
US20210133611A1 (en) | Methods and systems for providing dynamic constitutional guidance | |
Adebayo | Predictive model for the classification of hypertension risk using decision trees algorithm | |
US10431339B1 (en) | Method and system for determining relevant patient information | |
CN110827945B (en) | Control method for generating key factors based on medical data | |
WO2022271572A1 (en) | System and method for determining a stool condition | |
CN110827989B (en) | Control method for processing medical data based on key factors | |
Theilen et al. | AI-based decision support systems in intensive care | |
CN113140326A (en) | New crown pneumonia detection device, intervention device and detection intervention system | |
CN110827988B (en) | Control method for medical data research based on mobile terminal | |
Tian et al. | Development and validation of a web-based calculator for determining the risk of psychological distress based on machine learning algorithms: A cross-sectional study of 342 lung cancer patients | |
Finlay et al. | Mining, knowledge and decision support | |
US20250226089A1 (en) | Apparatus and method for generating an electrocardiogram verification set | |
Costandache | Using machine learning and sound processing techniques to improve patient health | |
Kavitha et al. | Asthma prediction and monitoring | |
Sharma et al. | XAI-based Data Visualization in Multimodal Medical Data | |
Cyganek et al. | Selected aspects of electronic health record analysis from the big data perspective | |
Hurley | Deep Semi-Supervised and Multi-Stage Learning for Medical Applications | |
Naumann | Leveraging text representations for clinical predictive tasks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |