[go: up one dir, main page]

CN110827945B - Control method for generating key factors based on medical data - Google Patents

Control method for generating key factors based on medical data Download PDF

Info

Publication number
CN110827945B
CN110827945B CN201810952497.5A CN201810952497A CN110827945B CN 110827945 B CN110827945 B CN 110827945B CN 201810952497 A CN201810952497 A CN 201810952497A CN 110827945 B CN110827945 B CN 110827945B
Authority
CN
China
Prior art keywords
variable
key
data
semantic
variables
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810952497.5A
Other languages
Chinese (zh)
Other versions
CN110827945A (en
Inventor
姚娟娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Mingping Medical Data Technology Co ltd
Original Assignee
Shanghai Mingping Medical Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Mingping Medical Data Technology Co ltd filed Critical Shanghai Mingping Medical Data Technology Co ltd
Priority to CN201810952497.5A priority Critical patent/CN110827945B/en
Publication of CN110827945A publication Critical patent/CN110827945A/en
Application granted granted Critical
Publication of CN110827945B publication Critical patent/CN110827945B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a control method for generating key factors based on medical data, which comprises the following steps: a. defining a data model
Figure DSA0000169331850000011
Where M denotes a variable set composed of a plurality of key variables, M ═ M1,m2,…mn}; r represents a semantic relationship set among a plurality of key variables, R ═ R1,r2,…rn};
Figure DSA0000169331850000012
An association function representing semantic relationship correspondence between the key variables,
Figure DSA0000169331850000013
Figure DSA0000169331850000014
ri∈R,<mq,mp>∈M×M,mqdenotes a variable of origin, mpRepresenting an endpoint variable; b. defining a data set D ═ D consisting of a plurality of basic medical data1,d2,…dnIf m isi=mod(di) Wherein m isi∈M,diE is as for D; then diNamely the key factor. The invention is achieved byThe algorithm generates key factors based on the basic medical data and provides basis for the standardized processing of the medical big data.

Description

Control method for generating key factors based on medical data
Technical Field
The invention relates to the field of big data processing, in particular to a method for processing marking points of medical big data, and specifically relates to a control method for generating key factors based on medical data.
Background
With the advent of the big data age, a variety of different types of data were collected and processed, medical data being the most specific type of data that contained a wide variety of variables including patient data, physician data, disease data, symptom data, test data, diagnostic data, treatment data, drug data, and the like. Starting from the medical activity itself, the biggest difference is that there is a clear logical relationship between medical behaviors, and therefore there is a clear semantic relationship between medical data, and at the same time, medical data is usually input or generated by a doctor or a patient, and the semantic relationship between the data can reflect the decision of the doctor and the correlation between the corresponding treatment method and the disease development condition.
When a doctor learns medical knowledge or needs to make a diagnosis decision, if a standardized database can be built, the database can facilitate the doctor to make a decision quickly, and the database has very important reference value and guiding significance. In the prior art, data processing is mainly focused on data of life behavior information data of common people, and a generally adopted mode is weighted fusion, but obviously, the method cannot be applied to processing medical data.
With the development of technology, the processing of medical data is also studied, but the processing of medical data is still directed to comprehensively processing specific types of medical data and creating a database, so that an isolated type of database is not highly referenced for doctors, and doctors still need to consume a large amount of labor to integrate different types of databases when actually applying the database, thereby guiding the actual work.
The generation of the marking points is particularly important if batch processing of medical data is to be realized.
Disclosure of Invention
The technical problem solved by the technical scheme of the invention is to provide accurate marking points for batch processing of medical data and realize quick and accurate processing of medical big data.
In order to solve the technical problem, the technical scheme of the invention is a control method for generating key factors based on medical data, which comprises the following steps:
a. defining the data model mod ═<M,R,φ>Wherein M represents a variable set composed of a plurality of said key variables, M ═ M1,m2,…mn}; r represents a semantic set of relationships between a plurality of said key variables, R ═ R1,r2,…rn}; phi represents a correlation function corresponding to the semantic relationship between the key variables,
Figure GDA0003573626230000021
ri∈R,<mq,mp>∈M×M,mqdenotes a variable of origin, mpRepresenting an endpoint variable;
b. defining a data set D ═ D composed of a plurality of said base medical data1,d2,…dnIf m isi=mod(di) Wherein m isi∈M,diE is as for D; then diNamely the key factor.
Preferably, the step a is followed by the steps of:
a1. constructing a multivariate index map;
a2. performing a fusion step on the set of variables M based on the multivariate index map.
Preferably, the step a2 is followed by the following steps:
a3. performing a fusion step on the set of semantic relationships R based on the multivariate index map.
Preferably, the step a1 includes the following steps:
i. extracting the characteristics of each key variable one by one and establishing a univariate index one by one based on the characteristics;
establishing edges corresponding to semantic relations among the single variable indexes based on the semantic relations among the key variables;
pairwise mining association rules among the single variable indexes with semantic relations, and establishing edges corresponding to the association rules;
constructing the multivariate index map
Figure GDA0003573626230000022
Wherein V (G)C) Is a set of all said features corresponding to all said key variables, E (G)C) Is a set of edges corresponding to all the semantic relations and edges corresponding to all the association rules,
Figure GDA0003573626230000023
is a function corresponding to the association rule between the univariate indexes.
Preferably, the step ii comprises the steps of:
if the two key variables have a semantic relationship, judging that the two univariate indexes corresponding to the two key variables have the semantic relationship;
and iI2, sequentially connecting the two univariate indexes of the existing semantic relationship.
Preferably, in step iii, the association rule is mined by:
iiiI 1, constructing a plurality of feature chains based on a plurality of features corresponding to a plurality of key variables with semantic relationship
Figure GDA0003573626230000024
The characteristic chain
Figure GDA0003573626230000025
Satisfies the following conditions:
Figure GDA0003573626230000026
Figure GDA0003573626230000027
any two adjacent features contained in each feature chain have a semantic relationship, wherein,
Figure GDA0003573626230000028
in order to be a starting point characteristic,
Figure GDA0003573626230000029
is an endpoint feature, C refers to a set of features that includes all of the features;
iiiI 2. calculate a plurality of feature chains
Figure GDA00035736262300000210
Defining the minimum probability as the minimum support degree and the minimum conditional probability as the minimum confidence degree of the probability and the conditional probability of each feature chain;
iiiI 3. if implied
Figure GDA00035736262300000211
If the minimum support degree and the minimum confidence degree are simultaneously satisfied, the implication expression
Figure GDA00035736262300000212
Figure GDA00035736262300000213
Is based on
Figure GDA00035736262300000214
Built univariate index and base
Figure GDA00035736262300000215
And establishing association rules of the univariate indexes.
Preferably, the step a2 includes the following steps:
a2. based on the multivariate index map
Figure GDA0003573626230000031
Establishing independent feature set C 'corresponding to different variables'POf the independent feature set C'PThe following conditions are satisfied: absence of E ∈ E (G)c) So that
Figure GDA0003573626230000032
Wherein C isi∈C′P,type(e)=0,Cj∈V(GC);
a2. Calculated according to the following formulaVariable mpVariable fusion weight w ofp
Figure GDA0003573626230000033
Wherein m ispE M, x represents the independent feature set C'PThe number of features involved;
performing a fusion step on the set of variables M based on the variable fusion weights.
Preferably, the step a3 is followed by the following steps:
a3. obtaining all and mpThe association variable set M 'with semantic relation exists, and the variable set M' meets the following conditions: for any miAll are rpiE R, such that
Figure GDA0003573626230000034
Or
Figure GDA0003573626230000035
Wherein M' belongs to M, rpiIs mpCorresponding semantic relation, and the fusion weight corresponding to the variable in the variable set M' is wi
a4. Calculating the semantic relation r according to the following formulapiSemantic relationship fusion weight of
Figure GDA0003573626230000036
Figure GDA0003573626230000037
Wherein y represents the number of variables contained in the associated variable set M';
a3. and executing a fusion step on the semantic relation set R based on the semantic relation fusion weight.
Preferably, the order of all key variables in the variable set M is randomly rearranged.
Preferably, the step b is followed by the steps of:
c. acquiring the basic medical data corresponding to a plurality of source terminals, wherein the source terminals are selected in a manual screening mode;
d. screening at least one correction factor from the basic medical data corresponding to the plurality of source terminals based on the data model, wherein the correction factor and the key factor have the same data structure;
e. a correction step is performed on the key factor based on the correction factor.
The invention generates key factors through a specific algorithm and provides a basis for the standardized processing of medical big data.
The technical scheme of the invention also corrects the key factor by introducing the source terminal.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a control method for generating key factors based on medical data, according to an embodiment of the present invention;
FIG. 2 is a flow chart of another key factor generation according to the first embodiment of the present invention;
FIG. 3 is a flow chart of another key factor generation according to the second embodiment of the present invention;
fig. 4 shows a control method for generating key factors based on medical data according to a third embodiment of the present invention.
Detailed Description
In order to better and clearly show the technical scheme of the invention, the invention is further described with reference to the attached drawings.
The skilled person understands that medical data typically comes from a user terminal, which may be understood as a terminal device that may collect data by manual input or by connection with different detection devices. For example, the data can be input by a manual input or automatic recognition by taking a picture, or the data can be acquired in real time through an open port by a computer sharing data with a physical sign sensor or a medical detection device. More specifically, the underlying medical data is data associated with an individual, which can be understood from multiple dimensions: from the generation channel of medical data, the basic medical data can be mainly divided into doctor-side data and patient-side data, the doctor-side data comprises outpatient and emergency records, hospitalization records, image records, laboratory records, medication records, operation records, follow-up records and the like, and the patient-side data comprises personal living habits, living environments, family heredity, family environments and the like. From the composition structure of the basic medical data, the basic medical data can be divided into: (1) measurement values generated by the examination means, such as body temperature, blood pressure, blood oxygen saturation, assay values, and the like; (2) signals recorded by the instrument, such as electrocardiography, electroencephalography, and the like; (3) images generated by medical imaging equipment, such as X-ray images, CT images, MRI images and the like; (4) report results presented in text form, such as explanations given by doctors in combination with their own medical knowledge for measurement values, signals, images, pathological diagnoses made by doctors, and the like; (5) narrative data, such as complaints recorded by a physician (patient-dictated illness), patient history; (6) metadata text, such as knowledge about organs, drugs, diseases, and treatment methods, parameters of medical devices, and the like; (7) social characteristics such as institution information of hospitals, personal information of doctors and patients, and the like. Although the structures and the contained semantics of the different types of basic medical data are different, the different types of basic medical data can mutually prove and complement each other, and all express the content and the characteristics of medical information from a specific angle to form a diverse and complementary data set.
As a specific embodiment of the present invention, fig. 1 shows a flowchart for generating a key factor, which specifically includes the following steps:
first, step S101 is executed to define the data model mod<M,R,φ>Wherein M represents a variable set composed of a plurality of said key variables, M ═ M1,m2,…mn}; r represents a semantic set of relationships between a plurality of said key variables, R ═ R1,r2,…rn}; phi represents a correlation function corresponding to the semantic relationship between the key variables,
Figure GDA0003573626230000041
ri∈R,<mq,mp>∈M×M,mqdenotes a variable of origin, mpThe endpoint variable is indicated. Specifically, the key variables may refer to only data structures, such as text data, audio, image, and video data, which are defined as different key variables due to different data structures; for example, not only the data structure but also the generation method of data, for example, CT data and MRI data belong to image data, but the generation method may be further divided into a plurality of variables. Those skilled in the art understand that the above explanation of the key variables also considers the description in step S101, and is not repeated herein. More specifically, semantic relationships reflect the dependency between different key variables, which are directional, so there are a start variable and an end variable. The correlation function represents a logical algorithm in which the dependency exists.
Further, the key variables may be defined manually or derived from an existing public database, preferably, the key variables are derived by taking the disease category as a standard and comprehensively considering multidimensional type data related to each disease, for example, taking hypertension as an example, various types of data such as sex, age, blood pressure value, dietary structure, familial heredity, exercise amount and the like may be defined as key variables, and accordingly, other types of diseases may be derived in a similar manner. More specifically, the data types used for defining the key variables are also various, and may include purely digital data (such as physical examination results and physical sign parameters), waveform signals (such as electrocardiosignals and electroencephalogram signals), images (such as images generated by medical instruments such as MRI and CT), texts (such as information of patients, descriptions of symptoms and texts of diagnosis results), and the like.
Those skilled in the art understand that, for the key variables of the medical industry, it is common to generate a series of variables around a patient in units of the patient, or alternatively, generate a series of variables in units of a study or a disease, so that the key variables related to medicine are semantically related, unlike the general big data in the life field. Specifically, the plurality of key variables may be obtained by classifying medical data according to methods such as a common SVM, a BP neural network, and a decision tree, the data model is preferably set based on the key variables according to different disease types, and the data model may be understood as a logical data model, which may be used as a data model supported by a specific database management system and mainly includes three types, i.e., a mesh data model, a hierarchical data model, and a relational data model. The data model is both user-oriented and system-oriented, and is mainly used for realizing a database management system. Data models are used in databases to abstract, represent and process data and information in the real world, mainly to study the logical structure of the data.
Further, the key factor is processed according to the processing principle of the basic medical data and the medical logic, for example, the basic medical data includes a plurality of different semantics after being analyzed, accordingly, the key factor obtains a plurality of nodes which are indispensable in the logic relationship from the interrelation of the plurality of semantics, and the key factor can be obtained based on the plurality of nodes. For example, a series of basic medical data including age, sex, place of life, dietary structure, medicine taken, symptom, chemical examination index, image data, disease, doctor for seeing a doctor and the like is formed after Zhang san of a patient visits a certain hospital in Yinchuan city in Ningxia, and accordingly, the age, symptom, chemical examination index and image data are used as a plurality of nodes, the basic medical data of other users with the same disease as Zhang san are selected, data values of the plurality of nodes can be obtained through statistics, and a key factor is obtained through calculation according to the data values of the plurality of nodes; as a variation, other basic medical data of the doctor who has the same doctor with zhang san can be selected to obtain data values of a plurality of nodes through statistics, and a key factor is calculated according to the data values of the plurality of nodes; as another variation, the diet and the medicine can be added as nodes, and the range of key factors is expanded.
In a specific application example, a key variable m representing patient information can be established1Showing doctor informationCritical variable m1Represents a disease information key variable m3A key variable m representing a patient examination index4Is used for representing the chemical examination index data of the patient and representing the key variable m of the CT image of the patient5Key variable m representing the medical gist6And establishing a semantic relation set containing the key variables based on the key variables. Wherein m is1And m3Is the semantic relationship of the patient and the disease; m is1And m3The semantic relationship between doctors and diseases; m is4And m1The semantic relation of the patients is described by chemical examination indexes; m is a unit of4And m1The semantic relation between the two indexes is made by doctors; m is a unit of6And m1The semantic relationship between the medical points given by the doctor is shown in the specification; m is5And m1The images describe the semantic relation of the patient; m is6And m5There is a semantic relationship of the diagnostic method. Accordingly, different semantic relationships have different association functions to be determined.
Further, step S101 is executed to define a data set D ═ D composed of a plurality of the basic medical data1,d2,…dnIf m isi=mod(di) Wherein m isi∈M,diE is as for D; then diNamely the key factor. Specifically, a database composed of the basic medical data is processed according to the data model to obtain a series of data sets with logical relations, and then the contact ratio of the data sets and a variable set composed of key variables is judged to obtain the key factors.
As a first embodiment of the present invention, fig. 2 shows a flowchart for generating another key factor, which specifically includes the following steps:
first, step S201 is executed to define the data model mod<M,R,φ>Wherein, M represents a variable set composed of a plurality of key variables, and M is { M ═ M1,m2,…mn}; r represents a semantic set of relationships between a plurality of said key variables, R ═ R1,r2,…rn}; phi represents the semantic relationship correspondence between the key variablesThe correlation function of (a) is selected,
Figure GDA0003573626230000061
ri∈R,<mq,mp>∈M×M,mqdenotes a variable of origin, mpThe endpoint variable is indicated.
Further, step S202 is executed to construct a multivariate index map. Specifically, the multivariate index map is a database structure formed by modeling the basic medical data through the data model, and defines the boundary and the logical data layer structure of the whole database, and accordingly, the semantic relationship has constraint on the database, and the correlation function corresponding to the semantic relationship also has constraint on the database. Those skilled in the art understand that data redundancy is common to data comprising a plurality of different variables, and therefore it is necessary to check the semantic relationships and the constraints of the association functions implied by the semantic relationships. Preferably, in the modeling process, the modeling should be performed in a certain order, so that for data with dependency relationship, it can be ensured that the depended data is stored before the dependency data, and then a check can be made on whether the semantic relationship is correct and the uniqueness attribute is unique. The order may be generated by traversing semantic relationships between key variables in the variable set.
Further, step S203 is executed, and the fusion step is executed on the variable set M based on the multivariate index map. Those skilled in the art understand that there is a rule for associating some features in the feature set corresponding to a key variable with features of other key variables, that is, when a certain feature appears in the data of the key variable, the corresponding certain feature also appears in the data of the corresponding key variable with a high probability. This substantially reflects the correlation between data of different key variables, that is, the information expressed by the features with association rules can be calculated by information in other key variables, so that this part of information is repeated in two key variables, and when fusing multiple key variables, the similarity of multiple key variables needs to be considered, and when calculating the similarity of multiple key variable data, the information with a certain repeatability should be removed, and only the relatively opposite part of one key variable is considered. More specifically, the number of features contained in each key variable reflects the amount of information contained in the data in the key variable, and key variables containing more features have larger information amount and should be given higher weight when the key variables are fused. Correspondingly, the weight of each key variable can be calculated based on the multivariate index graph, and after the weights of the key variables are obtained, a variable set consisting of all the key variables can be fused based on the weights.
Further, step S204 is executed, and a fusion step is executed on the semantic relation set R based on the multivariate index map. The principle and implementation of this step are similar to step S203, and those skilled in the art can understand it in conjunction with step S203.
Further, step S205 is executed to define a data set D ═ D composed of a plurality of the basic medical data1,d2,…dnIf m isi=mod(di) Wherein m isi∈M,diE is as for D; then diNamely the key factor. Those skilled in the art will appreciate that the second embodiment differs from the first embodiment in that: the variable set and the semantic relation set are fused, and accordingly, the fused data model mod is optimized, and therefore the obtained key factors are accurate. In particular, the second embodiment is not generally suitable for use at the beginning of database formation, but is enabled after the database has accumulated to a certain extent, particularly for a brief start-up for a certain data peak period.
As a third embodiment of the present invention, fig. 3 shows another flowchart for generating key factors based on medical data, which specifically includes the following steps:
first, step S301 is executed to define the data model mod<M,R,φ>Wherein M represents a variable set composed of a plurality of said key variables, M ═ M1,m2,…mn}; r represents a semantic set of relationships between a plurality of said key variables, R ═ R1,r2,…rn}; phi represents a correlation function corresponding to the semantic relationship between the key variables,
Figure GDA0003573626230000071
ri∈R,<mq,mp>∈M×M,mqdenotes a variable of origin, mpThe endpoint variable is indicated.
Further, step S302 is executed to extract the features of each of the key variables one by one and establish a univariate index one by one based on the features.
Further, step S303 is executed to establish an edge corresponding to the semantic relationship between the single variable indexes based on the semantic relationship between the multiple key variables.
Specifically, as an algorithm for preferably establishing the semantic relationship boundary, the following is specifically mentioned:
firstly, if the two key variables have semantic relation, judging that the two univariate indexes corresponding to the two key variables have semantic relation;
secondly, the two univariate indexes of the existing semantic relationship are communicated one by one.
Further, step S304 is executed, pairwise mining association rules between the plurality of univariate indexes having semantic relationships, and establishing edges corresponding to the plurality of association rules;
specifically, as an algorithm for preferably establishing the association rule boundary, the following is specifically mentioned:
firstly, a plurality of feature chains are constructed based on a plurality of features corresponding to a plurality of key variables with semantic relations
Figure GDA0003573626230000081
The characteristic chain
Figure GDA0003573626230000082
Satisfies the following conditions:
Figure GDA0003573626230000083
Figure GDA0003573626230000084
any two adjacent features contained in each feature chain have a semantic relationship, wherein,
Figure GDA0003573626230000085
in order to be a starting point characteristic,
Figure GDA0003573626230000086
is an endpoint feature, C refers to a set of features that includes all of the features;
second, a plurality of feature chains are computed
Figure GDA0003573626230000087
Defining the minimum probability as the minimum support degree and the minimum conditional probability as the minimum confidence degree of the probability and the conditional probability of each feature chain;
finally, if implied
Figure GDA0003573626230000088
If the minimum support degree and the minimum confidence degree are simultaneously satisfied, the implication expression
Figure GDA0003573626230000089
Is based on
Figure GDA00035736262300000810
Built univariate index and base
Figure GDA00035736262300000811
And establishing association rules of the univariate indexes.
Further, step S305 is executed to construct the multivariate index map
Figure GDA00035736262300000812
Wherein V (G)C) Is a set of all said features corresponding to all said key variables, E (G)C) Is all atThe set of the edges corresponding to the semantic relation and the edges corresponding to all the association rules,
Figure GDA00035736262300000813
is a function corresponding to the association rule between the univariate indexes.
Further, step S306 is executed, and the fusion step is executed on the variable set M based on the multivariate index map. In particular, a preferred algorithm is shown below:
first, based on the multivariate index map
Figure GDA00035736262300000814
Establishing independent feature set C 'corresponding to different variables'POf the independent feature set C'PThe following conditions are satisfied: absence of E ∈ E (G)c) So that
Figure GDA00035736262300000815
Wherein C isi∈C′P,type(e)=0,Cj∈V(GC);
Next, the variable m is calculated according to the following formulapVariable fusion weight w ofp
Figure GDA00035736262300000816
Wherein m ispE M, x represents the independent feature set C'PThe number of features involved;
finally, a fusion step is performed on the set of variables M based on the variable fusion weights.
Further, step S307 is executed, and a fusion step is executed on the semantic relation set R based on the multivariate index map. In particular, a preferred algorithm is shown below:
first, all and m are obtainedpThe association variable set M' with semantic relation satisfies the following conditions: for any miAll are rpiE R, such that
Figure GDA00035736262300000817
Or
Figure GDA00035736262300000818
Wherein M' belongs to M, rpiIs mpCorresponding semantic relation, and the fusion weight corresponding to the variable in the variable set M' is wi
Next, the semantic relation r is calculated according to the following formulapiSemantic relationship fusion weight of
Figure GDA0003573626230000091
Figure GDA0003573626230000092
Wherein y represents the number of variables contained in the associated variable set M';
and finally, executing a fusion step on the semantic relation set R based on the semantic relation fusion weight.
Further, step S308 is executed to define a data set D ═ D composed of a plurality of the basic medical data1,d2,…dnIf m isi=mod(di) Wherein m isi∈M,diE is as for D; then diNamely the key factor.
Those skilled in the art will appreciate that the second embodiment differs from the first embodiment in that the third embodiment shows a specific algorithm for generating key factors based on a feature fusion method, which is more easily applied in a practical process.
As a variation of the first and second embodiments, the order of all the key variables in the variable set M is randomly rearranged.
As a third embodiment of the present invention, fig. 4 shows a control method for generating a key factor based on medical data, comprising the steps of:
first, step S401 is executed to define the data model mod ═<M,R,φ>Wherein M represents a variable set composed of a plurality of said key variables, M ═ M1,m2,…mn}; r representsA set of semantic relationships between a plurality of said key variables, R ═ R1,r2,…rn}; phi represents a correlation function corresponding to the semantic relationship between the key variables,
Figure GDA0003573626230000093
ri∈R,<mq,mp>∈M×M,mqdenotes a variable of origin, mpRepresenting an endpoint variable;
further, step S402 is executed to define a data set D ═ D composed of a plurality of the basic medical data1,d2,…dnIf m isi=mod(di) Wherein m isi∈M,diE is as for D; then diNamely the key factor;
those skilled in the art will understand that the steps S401 and S402 can be combined with the descriptions and variations of the specific implementation manner, the first embodiment, and the second embodiment of the present invention, and are not described herein again.
Further, step S403 is executed to acquire the basic medical data corresponding to a plurality of source terminals, and the source terminals are selected in a manual screening manner. Those skilled in the art understand that, when the method is executed in this step, in order to improve the accuracy of the key factor, the basic medical data of the source terminal needs to be called to modify the key factor according to the foregoing step. Specifically, the source terminal is manually screened, that is, in a conventional medical big data processing project, a representative user terminal appearing in the processing process is manually marked, and then the corresponding user terminal is used as the source terminal. In a typical application scenario, in a daily data processing process, a certain user terminal feeds back an objection to a result processed by a system, and the objection condition of the user terminal is manually checked and marked.
Further, step S404 is executed to screen out at least one correction factor from the basic medical data corresponding to the plurality of source terminals based on the data model. Specifically, the implementation manner of this step may refer to the specific implementation manner shown in fig. 1 and a plurality of following embodiments, which are not described herein again. More specifically, the difference in generating the correction factor in this step is that the source of the basic medical data for generating the key factor shown in fig. 1 is not limited, but the basic medical data for generating the correction factor is limited to the source terminal.
Further, step S405 is executed, and a correction step is executed on the key factor based on the correction factor. Specifically, the correction step may be a direct substitution, or may be adjusted using a conventional data fusion method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention.

Claims (7)

1. A control method for generating key factors based on medical data is characterized by comprising the following steps:
a. define data model mod ═<M,R,φ>Where M denotes a variable set composed of a plurality of key variables, M ═ M1,m2,…mnClassifying a plurality of key variables according to the medical data to obtain the key variables; r represents a semantic set of relationships between a plurality of said key variables, R ═ R1,r2,…rn}; phi represents a correlation function corresponding to the semantic relationship between the key variables,
Figure FDA0003573626220000011
ri∈R,<mq,mp>∈M×M,mqdenotes a variable of origin, mpRepresenting an endpoint variable;
a1. constructing a multivariate index map;
i. extracting the characteristics of each key variable one by one and establishing a univariate index one by one based on the characteristics;
establishing edges corresponding to semantic relations among the single variable indexes based on the semantic relations among the key variables;
pairwise mining association rules among the single variable indexes with semantic relations, and establishing edges corresponding to the association rules;
constructing the multivariate index map
Figure FDA0003573626220000012
Wherein V (G)C) Is a set of all said features corresponding to all said key variables, E (G)C) Is a set of edges corresponding to all the semantic relations and edges corresponding to all the association rules,
Figure FDA0003573626220000013
is a function corresponding to the association rule between the univariate indexes;
a2. performing a fusion step on the set of variables M based on the multivariate index map;
a3. performing a fusion step on the semantic relation set R based on the multivariate index map;
b. defining a data set D ═ D consisting of a plurality of basic medical data1,d2,…dnIf m isi=mod(di) Wherein m isi∈M,diE is as for D; then diNamely the key factor.
2. The control method according to claim 1, wherein the step ii includes the steps of:
if the two key variables have a semantic relationship, judging that the two univariate indexes corresponding to the two key variables have the semantic relationship;
and iI2, sequentially connecting the two univariate indexes of the existing semantic relationship.
3. The control method according to claim 2, wherein in the step iii, the association rule is mined by:
iiiI 1. based on that there are multiple semantic relationships that key variables correspond toConstructing a plurality of feature chains from the features
Figure FDA0003573626220000014
The characteristic chain
Figure FDA0003573626220000015
Satisfies the following conditions:
Figure FDA0003573626220000016
m is not equal to n, m is not more than n, i is not equal to j, i is not more than m, j is not more than n, | i-j | is not less than 3, any two adjacent features contained in each feature chain have a semantic relationship, wherein,
Figure FDA0003573626220000017
as a characteristic of the starting point, the method,
Figure FDA0003573626220000018
is an endpoint feature, C refers to a set of features that includes all of the features;
iiiI 2. calculate a plurality of feature chains
Figure FDA0003573626220000021
Defining the minimum probability as the minimum support degree and the minimum conditional probability as the minimum confidence degree of the probability and the conditional probability of each feature chain;
iiiI 3. if implied
Figure FDA0003573626220000022
If the minimum support degree and the minimum confidence degree are simultaneously satisfied, the implication expression
Figure FDA0003573626220000023
Is based on
Figure FDA0003573626220000024
Built univariate index and base
Figure FDA0003573626220000025
And establishing association rules of the univariate indexes.
4. The control method according to claim 3, wherein the step a2 includes the steps of:
a21. based on the multivariate index map
Figure FDA0003573626220000026
Establishing independent feature set C 'corresponding to different variables'POf the independent feature set C'PThe following conditions are satisfied: absence of E ∈ E (G)c) So that
Figure FDA0003573626220000027
Wherein C isi∈C′P,type(e)=0,Cj∈V(GC);
a22. The variable m is calculated according to the following formulapVariable fusion weight w ofp
Figure FDA0003573626220000028
Wherein m ispE M, x represents the independent feature set C'PThe number of features involved;
a23. performing a fusion step on the set of variables M based on the variable fusion weights.
5. The control method according to claim 4, characterized in that said step a23 is followed by the steps of:
a31. obtaining all and mpThe association variable set M 'with semantic relation exists, and the variable set M' meets the following conditions: for any miAll are rpiE R, such that
Figure FDA0003573626220000029
Or
Figure FDA00035736262200000210
Wherein M' belongs to M, rpiIs mpCorresponding semantic relation, and the fusion weight corresponding to the variable in the variable set M' is wi
a32. Calculating the semantic relation r according to the following formulapiSemantic relationship fusion weight of
Figure FDA00035736262200000211
Wherein y represents the number of variables contained in the associated variable set M';
a33. and executing a fusion step on the semantic relation set R based on the semantic relation fusion weight.
6. Control method according to any of claims 1 to 5, characterized in that the order of all key variables in the set of variables M is randomly rearranged.
7. The control method according to any one of claims 1 to 5, characterized in that the step b is followed by the step of:
c. acquiring the basic medical data corresponding to a plurality of source terminals, wherein the source terminals are selected in a manual screening mode;
d. screening at least one correction factor from the basic medical data corresponding to the plurality of source terminals based on the data model, wherein the correction factor and the key factor have the same data structure;
e. a correction step is performed on the key factor based on the correction factor.
CN201810952497.5A 2018-08-14 2018-08-14 Control method for generating key factors based on medical data Active CN110827945B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810952497.5A CN110827945B (en) 2018-08-14 2018-08-14 Control method for generating key factors based on medical data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810952497.5A CN110827945B (en) 2018-08-14 2018-08-14 Control method for generating key factors based on medical data

Publications (2)

Publication Number Publication Date
CN110827945A CN110827945A (en) 2020-02-21
CN110827945B true CN110827945B (en) 2022-05-27

Family

ID=69547461

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810952497.5A Active CN110827945B (en) 2018-08-14 2018-08-14 Control method for generating key factors based on medical data

Country Status (1)

Country Link
CN (1) CN110827945B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184307A (en) * 2015-07-27 2015-12-23 蚌埠医学院 Medical field image semantic similarity matrix generation method
CN107145511A (en) * 2017-03-31 2017-09-08 上海森亿医疗科技有限公司 Structured medical data library generating method and system based on medical science text message
CN108319605A (en) * 2017-01-16 2018-07-24 医渡云(北京)技术有限公司 The structuring processing method and system of medical examination data

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160224637A1 (en) * 2013-11-25 2016-08-04 Ut Battelle, Llc Processing associations in knowledge graphs
US10593429B2 (en) * 2016-09-28 2020-03-17 International Business Machines Corporation Cognitive building of medical condition base cartridges based on gradings of positional statements
US10818394B2 (en) * 2016-09-28 2020-10-27 International Business Machines Corporation Cognitive building of medical condition base cartridges for a medical system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184307A (en) * 2015-07-27 2015-12-23 蚌埠医学院 Medical field image semantic similarity matrix generation method
CN108319605A (en) * 2017-01-16 2018-07-24 医渡云(北京)技术有限公司 The structuring processing method and system of medical examination data
CN107145511A (en) * 2017-03-31 2017-09-08 上海森亿医疗科技有限公司 Structured medical data library generating method and system based on medical science text message

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Semantic Health Knowledge Graph:Semantic Integration of Heterogeneous Medical Knowledge and Services;Longxiang Shi et al.;《BioMed Research International》;20170212;第2017卷;第1-12页 *
一种基于本体语义驱动的开放生物医学数据集成方法;王凯 等;《湖北工程学院学报》;20171130;第37卷(第06期);第78-84页 *
基于本体的生物医学数据多模态语义转换方法;王凯 等;《吉首大学学报(自然科学版)》;20170725;第38卷(第04期);第38-41页 *
精准医学知识库的构建;刘雷 等;《中华医学图书情报杂志》;20180630;第27卷(第06期);第1-9页 *

Also Published As

Publication number Publication date
CN110827945A (en) 2020-02-21

Similar Documents

Publication Publication Date Title
Spiegelhalter et al. Statistical and knowledge‐based approaches to clinical decision‐support systems, with an application in gastroenterology
US20030120458A1 (en) Patient data mining
JP7682270B2 (en) Techniques for generating predictive outcomes regarding spinal muscular atrophy using artificial intelligence
Al-Mualemi et al. A deep learning-based sepsis estimation scheme
US11568964B2 (en) Smart synthesizer system
Pokharel et al. Representing EHRs with temporal tree and sequential pattern mining for similarity computing
Zaman et al. A review on the significance of body temperature interpretation for early infectious disease diagnosis
US20210133611A1 (en) Methods and systems for providing dynamic constitutional guidance
Adebayo Predictive model for the classification of hypertension risk using decision trees algorithm
US10431339B1 (en) Method and system for determining relevant patient information
CN110827945B (en) Control method for generating key factors based on medical data
WO2022271572A1 (en) System and method for determining a stool condition
CN110827989B (en) Control method for processing medical data based on key factors
Theilen et al. AI-based decision support systems in intensive care
CN113140326A (en) New crown pneumonia detection device, intervention device and detection intervention system
CN110827988B (en) Control method for medical data research based on mobile terminal
Tian et al. Development and validation of a web-based calculator for determining the risk of psychological distress based on machine learning algorithms: A cross-sectional study of 342 lung cancer patients
Finlay et al. Mining, knowledge and decision support
US20250226089A1 (en) Apparatus and method for generating an electrocardiogram verification set
Costandache Using machine learning and sound processing techniques to improve patient health
Kavitha et al. Asthma prediction and monitoring
Sharma et al. XAI-based Data Visualization in Multimodal Medical Data
Cyganek et al. Selected aspects of electronic health record analysis from the big data perspective
Hurley Deep Semi-Supervised and Multi-Stage Learning for Medical Applications
Naumann Leveraging text representations for clinical predictive tasks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant