[go: up one dir, main page]

CN119153127A - Intelligent encoding method and device for infectious disease data, computer equipment and storage medium - Google Patents

Intelligent encoding method and device for infectious disease data, computer equipment and storage medium Download PDF

Info

Publication number
CN119153127A
CN119153127A CN202411656948.2A CN202411656948A CN119153127A CN 119153127 A CN119153127 A CN 119153127A CN 202411656948 A CN202411656948 A CN 202411656948A CN 119153127 A CN119153127 A CN 119153127A
Authority
CN
China
Prior art keywords
result
coding
encoding
modeling
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411656948.2A
Other languages
Chinese (zh)
Inventor
刘运喜
林�建
姚宏武
霍瑞
杜明梅
陈春平
刘梦林
刘伯伟
白艳玲
李欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Xinglin Information Technology Co ltd
First Medical Center of PLA General Hospital
Original Assignee
Hangzhou Xinglin Information Technology Co ltd
First Medical Center of PLA General Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Xinglin Information Technology Co ltd, First Medical Center of PLA General Hospital filed Critical Hangzhou Xinglin Information Technology Co ltd
Priority to CN202411656948.2A priority Critical patent/CN119153127A/en
Publication of CN119153127A publication Critical patent/CN119153127A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The embodiment of the invention discloses an intelligent encoding method and device for infectious disease data, computer equipment and a storage medium. The method comprises the steps of obtaining medical text data related to infectious diseases, extracting local features from the medical text data, constructing a clinical medical knowledge graph, modeling a coding relationship to obtain a modeling result, fusing the local features and the modeling result to obtain a feature fusion result, inputting the feature fusion result into a coding decision model for coding to obtain an ICD coding result, and outputting the ICD coding result. By implementing the method provided by the embodiment of the invention, the characteristic extraction and coding prediction process of the infectious disease data can be improved, so that the accuracy and efficiency of coding are enhanced.

Description

Intelligent encoding method and device for infectious disease data, computer equipment and storage medium
Technical Field
The invention relates to a medical information processing method, in particular to an intelligent encoding method, an intelligent encoding device, computer equipment and a storage medium for infectious disease data.
Background
In modern hospital management, accurate encoding of infectious diseases is critical for disease monitoring, treatment analysis, and medical insurance settlement. With the popularity of electronic medical record systems, a vast amount of unstructured medical data emerges, including medical record records, laboratory results, and clinical diagnostic information. However, the encoding of such information manually is not only time consuming but also error prone, and thus the need for automated encoding becomes increasingly stringent.
Traditional ICD (International Classification of diseases ) coding methods rely on manual judgment, are affected by doctor experience and information entry quality, may lead to inaccurate or missing codes, and despite the progress of existing automatic coding techniques, challenges remain in feature extraction and contextual understanding, and coding relationship modeling is often not accurate enough. Some current techniques, such as multi-scale convolutional neural networks and graph convolution networks, while capable of processing text features and coding relationships, have limited capabilities in processing long text and consume significant computing resources.
Therefore, it is necessary to design a new method to achieve the enhancement of the feature extraction and coding prediction process of infectious disease data to enhance the accuracy and efficiency of coding.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an intelligent encoding method, an intelligent encoding device, computer equipment and a storage medium for infectious disease data.
In order to achieve the purpose, the invention adopts the following technical scheme that the intelligent encoding method for infectious disease data comprises the following steps:
acquiring medical text data related to an infectious disease;
Extracting local features from the medical text data;
constructing a clinical medical knowledge graph, and modeling the coding relationship to obtain a modeling result;
Fusing the local features with the modeling result to obtain a feature fusion result;
Inputting the characteristic fusion result into a coding decision model for coding so as to obtain an ICD coding result;
And outputting the ICD coding result.
The method for extracting the local features of the medical text data comprises the following steps of:
performing text decomposition and embedding on the medical text data to obtain an embedding matrix;
performing convolution operation on the embedded matrix to obtain a convolution result;
Carrying out maximum pooling layer processing on the convolution result to obtain a pooling result;
and performing feature map splicing and full connection processing on the pooling result to obtain local features.
The further technical scheme is that the construction of the clinical medical knowledge graph and the modeling of the coding relationship to obtain a modeling result comprise the following steps:
Constructing a clinical medical knowledge graph, and modeling the coding relationship by adopting a graph neural network to obtain a modeling result.
The further technical scheme is that the construction of the clinical medical knowledge graph and the modeling of the coding relationship by adopting the graph neural network to obtain a modeling result comprise the following steps:
constructing a clinical medical knowledge graph;
carrying out multi-layer graph convolution processing on the clinical medical knowledge graph to obtain node embedding representation;
Extracting global features from the node embedded representation using a pooling operation;
Calculating the similarity between nodes according to the global feature representation so as to determine the dependency relationship between the nodes;
and combining the node embedded representation and the dependency relationship among the nodes to obtain a modeling result.
The method further comprises the technical scheme that the multi-layer graph convolution processing is carried out on the clinical medical knowledge graph to obtain node embedded representation, and the method comprises the following steps:
information aggregation of neighbor nodes in the clinical medical knowledge graph is carried out, and feature weighted sum of each node is calculated;
and applying a weight matrix and bias terms of the graph convolution layer, and performing nonlinear transformation by using a ReLU activation function to obtain a node embedded representation.
The method further comprises the steps of fusing the local features with the modeling result to obtain a feature fusion result, wherein the feature fusion result comprises the following steps:
splicing the local features with the modeling result to obtain a high-dimensional feature vector;
performing dimension reduction processing on the high-dimension feature vector to obtain a dimension reduction result;
And inputting the dimension reduction result into an integrated learning model to perform feature fusion so as to obtain a feature fusion result.
The further technical scheme is that the characteristic fusion result is input into a coding decision model for coding to obtain an ICD coding result, and the ICD coding result comprises the following steps:
Inputting the characteristic fusion result into a coding decision model for coding, and determining an ICD coding result by adopting fuzzy logic;
The coding decision model is a multi-level neural network model constructed by using a reinforcement learning framework.
The invention also provides an intelligent encoding device for infectious disease data, which comprises:
A data acquisition unit for acquiring medical text data related to an infectious disease;
A local feature extraction unit for extracting local features from the medical text data;
the modeling unit is used for constructing a clinical medical knowledge graph and modeling the coding relationship to obtain a modeling result;
The fusion unit is used for fusing the local features with the modeling result to obtain a feature fusion result;
The coding unit is used for inputting the characteristic fusion result into a coding decision model for coding so as to obtain an ICD coding result;
and the output unit is used for outputting the ICD coding result.
The invention also provides a computer device which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the method when executing the computer program.
The present invention also provides a storage medium storing a computer program which, when executed by a processor, implements the above method.
Compared with the prior art, the method has the beneficial effects that the method is used for modeling the coding relation by acquiring the medical text data related to the infectious diseases and extracting the local features, constructing the clinical medical knowledge graph, fusing the local features and the modeling result, inputting the fused result into the coding decision model, generating and outputting the ICD coding result, and realizing the improvement of the feature extraction and coding prediction process of the infectious disease data so as to enhance the coding accuracy and efficiency.
The invention is further described below with reference to the drawings and specific embodiments.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of an application scenario of an intelligent encoding method for infectious disease data according to an embodiment of the present invention;
fig. 2 is a flow chart of an intelligent encoding method for infectious disease data according to an embodiment of the present invention;
FIG. 3 is a schematic sub-flowchart of an intelligent encoding method for infectious disease data according to an embodiment of the present invention;
FIG. 4 is a schematic sub-flowchart of an intelligent encoding method for infectious disease data according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a sub-flow of an intelligent encoding method for infectious disease data according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a sub-flowchart of an intelligent encoding method for infectious disease data according to an embodiment of the present invention;
FIG. 7 is a schematic block diagram of an intelligent encoding device for infectious disease data provided in an embodiment of the present invention;
FIG. 8 is a schematic block diagram of a local feature extraction unit of an intelligent encoding device for infectious disease data provided in an embodiment of the present invention;
FIG. 9 is a schematic block diagram of a modeling unit of an intelligent encoding device for infectious disease data provided by an embodiment of the present invention;
FIG. 10 is a schematic block diagram of a multi-layer diagram convolution subunit of an intelligent encoding device for infectious disease data provided in an embodiment of the present invention;
FIG. 11 is a schematic block diagram of a fusion unit of an intelligent encoding device for infectious disease data provided by an embodiment of the present invention;
Fig. 12 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Referring to fig. 1 and fig. 2, fig. 1 is a schematic diagram of an application scenario of an intelligent encoding method for infectious disease data according to an embodiment of the present invention. Fig. 2 is a schematic flow chart of an intelligent encoding method for infectious disease data according to an embodiment of the present invention. The intelligent encoding method for the infectious disease data is applied to a server. The method comprises the steps of carrying out data interaction between a server and a terminal, obtaining medical text data from an electronic medical record system, extracting text features by using a deep Convolutional Neural Network (CNN) to capture important information in disease description, constructing a clinical medical knowledge graph, processing complex relations between disease codes by using a Graph Neural Network (GNN) to realize deeper semantic understanding, fusing the features extracted by the CNN and the GNN by adopting an integrated learning method, improving the expressive capacity of a model, optimizing model training by reinforcement learning and meta learning, improving the adaptability and effect of the model training, and carrying out coding decision by using a fuzzy logic system and a dynamic threshold adjustment strategy to ensure that the finally generated ICD codes are accurate. The method aims at improving the accuracy of disease coding, and simultaneously can effectively process a large amount of complex medical data, thereby improving the efficiency of medical service.
Fig. 2 is a flow chart of an intelligent encoding method for infectious disease data according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S110 to S160.
S110, acquiring medical text data related to infectious diseases.
In this embodiment, the medical text data includes relevant records of infectious diseases, such as medical record descriptions, laboratory test results, and doctor diagnoses.
The method comprises the steps of receiving medical text data from a hospital electronic medical record system, mainly comprising medical record description, laboratory test results and diagnosis information of doctors, and extracting symptoms, medical history and physical signs of patients from the medical record to identify information related to infectious diseases. These descriptions help to understand patient specifics, obtain laboratory test results (e.g., blood tests and urine tests) that help to confirm the presence and severity of infectious disease, and extract diagnostic conclusions about infectious disease from doctor's diagnostic reports, including primary diagnosis, etiology analysis, and treatment advice.
Medical text data is ensured to be stored in a structured or unstructured format and preprocessed, including denoising, word segmentation and entity recognition, to ensure data quality.
Data normalization, namely denoising and format normalization are carried out on received data, and the data format is unified for subsequent processing.
Word segmentation and named entity recognition, namely performing word segmentation operation on medical texts, and extracting entities related to infectious diseases, such as disease names, symptoms and detection indexes, by using a Named Entity Recognition (NER) technology.
Labeling and model training, namely labeling the text data of each record and the corresponding ICD code, providing label data for subsequent model training, and helping to improve the accuracy of disease coding.
Through these steps, key information can be extracted from the medical text more effectively, providing support for subsequent analysis and decision making.
S120, extracting local features from the medical text data.
In this embodiment, local features refer to small segments or phrases in text, such as specific symptoms, disease names, experimental results, and the like. By sliding the convolution kernel over the text, the CNN is able to capture these local features, thereby identifying important keywords and patterns. For example, in the sentence "patient is suffering from high fever and cough", the local features may include "high fever" and "cough".
Context information representation refers to taking into account their location and relationship throughout the text when interpreting local features. The CNN can integrate the connection between different local features through multi-layer rolling and pooling operation, so that a more comprehensive understanding of the text is formed. For example, the foregoing "high fever" and "cough" in combination with other descriptions of the patient (e.g., medical history, therapeutic response) may help the model better understand the condition of the entire case.
By integrating the two, the CNN can effectively extract key information in medical texts and provide rich representations to support subsequent analysis or decision making.
In one embodiment, referring to fig. 3, the step S120 may include steps S121 to S124.
S121, performing text decomposition and embedding on the medical text data to obtain an embedding matrix.
In this embodiment, the embedding matrix refers to a matrix obtained by text decomposition and embedding of medical text data.
Specifically, the medical text is segmented to form a vocabulary or a character sequence. Existing word segmentation tools (e.g., stemming, spaCy, etc.) or custom word segmenters may be used for processing, e.g., for sentences "patient need regular exam," which may be broken down into "patient," need, "" regular, "" exam.
And converting the segmented text into an embedded matrix. Using Word embedding techniques such as Word2Vec, gloVe, or pre-trained models (e.g., BERT), the embedding matrix maps each Word into a fixed-dimension vector, typically tens to hundreds of dimensions, that preserves the semantic information of the Word.
The word embedding technology can capture semantic relations among words, words with similar meanings are closer in vector space, so that the model is more accurate in understanding context, high-dimensional sparse representation is converted into low-dimensional dense representation, calculation complexity is reduced, training efficiency is improved, existing knowledge can be directly utilized by using a pre-training model, the learning process of the model is accelerated, and performance is improved.
S122, performing convolution operation on the embedded matrix to obtain a convolution result.
In this embodiment, the convolution result refers to a result obtained after the embedding matrix performs the convolution operation.
Specifically, a convolution kernel (filter) is slid over the embedded matrix, performing a convolution operation. Each convolution kernel is responsible for extracting a particular type of local feature, for example, using a 3-gram convolution kernel, three consecutive vocabulary combinations may be detected. A weighted sum of the local regions is calculated to obtain a feature map.
The features after the convolution operation are non-linearly transformed, typically using ReLU (RECTIFIED LINEAR Unit) as the activation function. This operation can introduce non-linear factors that increase model expression capacity.
The convolution operation can effectively extract important local features in the text, such as keywords, phrases, etc., and helps capture context information. The model can learn more complex modes by activating functions such as ReLU, prediction accuracy is improved, and limitation of a linear model is avoided.
S123, carrying out maximum pooling layer processing on the convolution result to obtain a pooling result.
In this embodiment, the pooling result refers to a result obtained by performing maximum pooling layer processing on the convolution result.
Specifically, after the convolution operation, a maximum pooling layer is used to select local maxima from the convolution profile. This process helps reduce the dimension of the feature map and retains important information. For example, if the feature map is two-dimensional, the feature map is scaled down by selecting the maximum value in each 2x2 region.
In addition to maximum pooling, average pooling may also be applied, with an average being calculated from each local region to extract smoother features.
The maximum pooling operation enhances the robustness of the model to small changes in input, so that the model is more stable in the face of texts in different formats.
And S124, performing feature map splicing and full connection processing on the pooling result to obtain local features.
In this embodiment, multiple convolution kernels are used to generate different feature graphs, and these feature graphs are spliced to combine features extracted from different convolution kernels, and by splicing, features extracted from different convolution kernels are combined to form a richer feature representation.
Flattening the pooled feature map into a one-dimensional vector, and then carrying out further processing through a full connection layer to map the features into a high-dimensional space.
By splicing, the model can comprehensively consider various local features and improve the overall representation capability, and the full-connection layer can convert the low-dimensional features into high-dimensional representations, so that the model can learn more complex decision boundaries, and the accuracy of classification or regression tasks is improved.
Through the above process, the final feature map will be flattened into a one-dimensional vector. This vector, as a high-dimensional feature representation of the medical text, can be used for subsequent classification or regression tasks.
The flattened feature vector provides a unified input format, is convenient to combine with other machine learning algorithms, supports subsequent analysis and decision making, and finally obtains the feature vector which effectively concentrates important information in the text, is convenient for subsequent processing and deep analysis, and improves the efficiency and effect of medical text processing.
Through the steps, the deep convolutional neural network can effectively extract rich features in the medical text, and the understanding and analyzing capacity of the model on the text content is improved. The characteristics not only keep semantic information, but also enhance the expression capacity of the model through multi-level processing, and provide powerful support for medical analysis and decision.
S130, constructing a clinical medical knowledge graph, and modeling the coding relationship to obtain a modeling result.
In this embodiment, the modeling result refers to node representation and relationship information obtained by processing a clinical medical knowledge graph through a Graph Neural Network (GNN). These results can be used to implement a variety of applications, such as:
node embedding means that each node (e.g., disease, symptom, medication, etc.) captures its characteristics and relationships as a vector representation.
Similarity calculation, namely judging the relevance or similarity between different medical entities through the similarity between nodes, which is very important for disease diagnosis and treatment scheme recommendation.
Dependency relationships-dependency relationships between different medical concepts are revealed to help understand their role in clinical decisions.
Knowledge reasoning, namely reasoning by utilizing the relation among the nodes, and assisting doctors in making more accurate diagnosis and treatment suggestions.
And the decision support is used for providing data support for a clinical decision system and improving the efficiency and accuracy of medical service.
These modeling results can help medical personnel to better understand complex medical knowledge systems, thereby improving the therapeutic effect of the patient.
Specifically, a clinical medical knowledge graph is constructed, and a graph neural network is adopted to model the coding relationship so as to obtain a modeling result.
Graph structure data is processed using a Graph Neural Network (GNN) to capture complex relationships and dependencies between disease encodings.
In one embodiment, referring to fig. 4, the step S130 may include steps S131 to S135.
S131, constructing a clinical medical knowledge graph.
In this embodiment, in the knowledge graph, the main nodes include:
disease encoding nodes, a code uniquely identifying each disease;
disease name node, the name describing the specific disease;
Symptom node-various symptoms associated with the disease.
Each node is represented by a feature vector, for example, an embedded vector of disease codes may contain various types of attribute information (e.g., severity, route of propagation, etc.) of the disease.
Definition of relationship and edges:
The nodes are connected by edges, and the edges represent the relationship between the nodes:
Father-son relationship disease encoding nodes can be represented by hierarchical relationship. For example, a large class of disease may be a parent node for a sub-class of disease.
The association strength, the edge weight between the disease node and the symptom node, can be set based on statistical data or clinical research results, reflecting the association strength between symptoms and diseases.
Initializing the embedded representation of each node can take the following two ways:
random initialization, namely generating a random initial embedded vector.
Based on the pre-training model, pre-training is performed by using the existing medical text data, so that more effective embedded representation is obtained, and the performance of the model is improved.
And S132, carrying out multi-layer graph convolution processing on the clinical medical knowledge graph to obtain node embedding representation.
In one embodiment, referring to fig. 5, the step S132 may include steps S1321 to S1322.
S1321, information aggregation of neighbor nodes in the clinical medical knowledge graph is carried out, and feature weighted sum of each node is calculated;
s1322, applying a weight matrix and bias terms of the graph convolution layer, and performing nonlinear transformation by using a ReLU activation function to obtain a node embedded representation.
In this embodiment, the embedded representation of each node is initializedRandom initialization or pre-training model based embedding may be employed. For each nodeThe method specifically comprises the following steps of:
And calculating the weighted sum of the neighbor node characteristics of each node through information aggregation of the neighbor nodes, wherein the formula is as follows: Wherein, the method comprises the steps of, For the normalized coefficient to be a function of the normalized coefficient,For the weight matrix of the current layer,Is a bias term.
Non-linear transformation of the node representation using the aggregated result: Here, however, the number of the components, Typically a ReLU activation function, enhances the expressive power of the model.
S133, extracting global features from the node embedded representation by using pooling operation.
In this embodiment, after completion of multi-layer graph convolution, a pooling operation is applied to extract global features.
Specifically, key features are selected from the neighbors of each node, the dimensions are reduced, and important information is reserved. This process helps control the over-fitting and improves the efficiency of the model.
S134, calculating the similarity between the nodes according to the global feature representation so as to determine the dependency relationship between the nodes.
In this embodiment, a final node embedded representation is obtainedThereafter, important relationships are identified by computing the similarity between the nodes. The specific method comprises the following steps:
and (3) calculating the inner product between the node embedded vectors to obtain a similarity score.
Distance measurement, namely Euclidean distance or cosine similarity and other methods are used.
S135, combining the node embedded representation and the dependency relationship among the nodes to obtain a modeling result.
In particular, the role of the node embedded representation,
The node embedded representation is a vector representation generated after computing the feature and neighbor relation of each node in the knowledge-graph through a Graph Neural Network (GNN). These embedded vectors contain the properties of the node itself and its relationship to other nodes. The specific actions are as follows:
1. a high-dimensional representation of the node is provided. In the knowledge graph, node embedding represents transforming the complex characteristics of each node into a high-dimensional vector. These vectors capture semantic information of the nodes and contain relationships between the nodes. The embedded representation of the disease encoding node has higher dimension and can contain multidimensional characteristic information such as common symptoms, disease course, treatment mode and the like of a specific disease, and the high-dimension embedding enables the system to perform more accurate matching in the encoding process.
2. And the context information is enriched, and the prediction capability is enhanced. Such embedding can represent interdependencies and influence forces between nodes, capturing complex semantic relationships. For example, a disease encoding node may capture its strength of association with a symptom by its neighbor relation to the symptom node. If multiple similar disease encoding nodes share similar symptom information, their embedded representations may become similar, helping the system to identify links between these diseases when encoded.
3. The method is used for feature fusion and combining the output of other models. In a subsequent feature fusion process, the node embedded representation will be used in conjunction with medical text features extracted by CNN. The multi-modal feature fusion can provide more comprehensive disease coding decision support for the system.
In combination with medical text features, features extracted by CNN are mainly focused on local patterns and language information of medical text. The node embedded representation reflects more of the context and structural relationships in the knowledge-graph. Therefore, the combination of the two can fully utilize the detail information in the text and the global information in the knowledge graph to achieve the complementary effect.
The role of the inter-node dependencies is as follows:
The dependency relationship among the nodes is calculated in the knowledge graph through the GNN, and is a mode reflecting the mutual influence or association among the nodes (such as disease codes, symptoms and the like). The role of this dependency in coding decisions is as follows:
identifying complex diseases associated with symptoms:
In clinical practice, certain diseases have a strong dependence on specific symptoms, and these dependencies are reflected by the edge weights and feature propagation in GNNs. In the decision making process of disease coding, the system can utilize this dependency to improve accuracy.
For example, if the relationship between a particular symptom node and multiple disease encoding nodes is very strong (e.g., a symptom is a common manifestation of multiple diseases), the system may further confirm the rationality of the encoding selection by these dependencies.
In addition, some diseases may have a father-son relationship, for example, the dependency relationship between a disease of a certain subclass (such as an infection caused by a certain specific virus) and a disease above it (such as a generalized virus infection) may also be represented by this dependency model. These parent-child relationships help to find the best match in similar encodings, reducing errors.
Similarity calculation in support of coding decisions:
On the basis of the dependency relationship among the nodes, the system can perform node similarity calculation, so that the node with higher correlation is identified. This similarity calculation helps to capture interactions and the degree of similarity between different nodes, and is particularly effective when screening between similar disease codes.
Inner product or distance measure by which the system can calculate the similarity between two nodes, such as disease encoding nodes or symptom nodes. If the similarity between two encoding nodes is high, the system can judge that the two encoding nodes possibly have similar meanings in medical semantics.
The nodes with high similarity may share the same symptoms or clinical manifestations, and the system can further distinguish or cluster similar disease codes by utilizing the dependency relationship. Such information may assist the system in more refined disease code selection when coding decisions.
Optimizing hierarchical processing of disease codes:
Disease codes often have a hierarchical structure, e.g., some disease codes have a parent-child relationship or different levels of classification. The dependency relationship among the nodes can help the system to better handle the hierarchical structure problem during encoding, and ensures that the encoding selection accords with a standardized classification system.
Father-son coding selection, in some complex cases, the disease may belong to multi-level classification, and the system may reasonably select among father-son codes through the dependency relationship between disease coding nodes. For example, a particular infectious disease may be a subclass of a previous class (e.g., "infectious disease") by which the system can more precisely choose the appropriate code.
The specific application of node embedded representation and dependency relationship in feature fusion:
The system combines the node embedded representation with the text features extracted by the CNN to form a multidimensional feature vector. The key to this step is how to effectively fuse features of different sources.
And the system improves the robustness of the model on the basis of integrating node embedding and text characteristics by an integrated learning method (such as random forest). The integrated learning method can combine different features, reduce the deviation of a single model through the integration of multiple models, and enhance the prediction stability.
Use in coding decisions:
In the final encoding decision process, the system uses node similarity calculations and dependencies to optimize the selection of the encoding. The uncertainty is processed through the fuzzy logic, and the system can dynamically adjust the threshold value, so that the code selection is more flexible and accurate.
Uncertainty processing-based on node embedded representation and similarity, the system can judge the prediction probability of different codes. If the similarity of some codes is high, the system will prefer to select these codes and dynamically adjust them according to the corresponding threshold.
Dependency relationship auxiliary decision, namely, dependency relationship among nodes provides rich auxiliary information for the system. For example, between symptom nodes and disease coding nodes, the system can make more accurate coding decisions through these dependencies, ensuring that the selected codes closely match the patient symptoms.
In this embodiment, the embedded representation of each node is ultimately output, which representations are used for subsequent feature fusion and encoding decisions. The embedded information is rich in relation characteristics, so that the understanding and accuracy of intelligent coding of infectious diseases are remarkably enhanced.
The method has the advantages that different types of medical information can be effectively integrated through the graph structure, complex relations and dependencies are captured, the graph rolling network can learn rich node representations through multi-level aggregation and nonlinear transformation, the representation of a model is improved, the improved node similarity calculation provides powerful support for disease diagnosis and treatment schemes, clinicians can make more accurate decisions, and the combination of the knowledge graph and the graph rolling network has good expandability and can adapt to new data and new knowledge which are continuously increased.
And S140, fusing the local features and the modeling result to obtain a feature fusion result.
In this embodiment, the feature fusion result is a comprehensive feature expression obtained by combining the local feature with the modeling result. The goal of this process is to combine local features and global relationship information to improve the accuracy of subsequent encoding decisions.
In one embodiment, referring to fig. 6, the step S140 may include steps S141 to S143.
S141, splicing the local features with the modeling result to obtain a high-dimensional feature vector.
In this embodiment, the high-dimensional feature vector is a vector obtained by splicing the local feature and the modeling result.
Feature vectors from CNN and GNN are combined into one higher dimensional feature vector by a stitching operation. This stitching operation can effectively fuse different information from the image and the map structure data. The integration mode ensures that the model can fully utilize rich context information, so that the characteristic performance is more comprehensive.
S142, performing dimension reduction processing on the high-dimensional feature vector to obtain a dimension reduction result.
In this embodiment, the dimension reduction result refers to a result obtained after dimension reduction of the high-dimension feature vector.
In order to reduce the computational complexity and prevent overfitting, the spliced high-dimensional feature vectors are then subjected to dimension reduction. Common dimension reduction techniques include Principal Component Analysis (PCA), linear Discriminant Analysis (LDA), and the like, which can reduce feature dimensions while preserving important feature information. The method not only improves the calculation efficiency, but also helps the model focus on the features with the most discrimination capability, and avoids the interference caused by redundant information.
S143, inputting the dimension reduction result into an integrated learning model to perform feature fusion so as to obtain a feature fusion result.
In this embodiment, the fused feature vectors are then input into an ensemble learning model, such as a random forest or gradient-lifting tree. The integrated learning method remarkably improves the accuracy and stability of the whole by combining the prediction results of a plurality of base learners. In this process, each base learner is trained using the same feature set, ultimately generating a comprehensive prediction result by means of weighted voting or averaging.
The resulting fusion features will be used as input to the coding decision model. The feature set provides comprehensive information support, and the accuracy and performance of intelligent coding of infectious diseases are obviously enhanced.
The method combines the characteristics of CNN and GNN, so that the model can acquire information from various data types, the diversity of the information is improved, the redundancy characteristics are removed through dimension reduction processing, the complexity of the model is reduced, the generalization capability of the model on new data is stronger, the advantages of an integrated learning model are utilized, the accuracy and reliability of prediction are improved through the combination of a plurality of base learners, and the characteristic fusion mode can be applied to various intelligent decision systems, and has important application values in the fields of medical treatment, finance and the like.
Through the combination of the steps and the benefits, the characteristic fusion scheme can effectively improve the performance of the model and meet the actual application requirements.
The integrated learning method (such as random forest) improves the accuracy and stability of prediction through multi-model fusion, rather than a single multi-layer perceptron.
And S150, inputting the characteristic fusion result into a coding decision model for coding so as to obtain an ICD coding result.
In this embodiment, the ICD encoding result refers to the result of numbering diseases and health conditions in the international disease classification (International Classification of Diseases, abbreviated as ICD) system. ICD codes are used to normalize medical records, statistics, and research, helping doctors, researchers, and public health institutions to accurately record and analyze disease data.
In practice, ICD encoding results are typically obtained by analyzing patient clinical data, symptoms, diagnoses, etc. and mapping them to corresponding codes in the ICD system. This process may utilize a machine learning model to improve the accuracy and efficiency of encoding.
Specifically, inputting the feature fusion result into a coding decision model for coding, and determining an ICD coding result by adopting fuzzy logic;
The coding decision model is a multi-level neural network model constructed by using a reinforcement learning framework.
The reinforcement learning framework is applied to the coding decision model to optimize the coding decision process. Model parameters are dynamically adjusted through a reward mechanism, and the adaptability to different cases is improved. Meanwhile, a meta-learning strategy is introduced so that the model can be quickly adapted to the task of new disease coding.
And inputting the fused characteristics into a coding decision model. The uncertainty of the predicted outcome is processed using a fuzzy logic system, optimizing the final coding selection by a dynamic threshold adjustment strategy. And comparing the predicted probability value of each ICD code with the adjusted threshold value, and selecting the ICD code meeting the requirements as the final disease code output to finish code allocation.
In particular, building a model for coding decisions typically employs a multi-level neural network structure. This structure can effectively process complex input data so that the model can capture and express various characteristic relationships in the data.
The input of the model is a fused feature vector which integrates various information related to diseases, such as medical records of patients, examination results and the like.
And outputting the model, namely outputting the corresponding disease codes, wherein the prediction probability of each code reflects the matching degree of the code and the current input characteristic.
By introducing reinforcement learning algorithms (e.g., Q-learning or deep Q-networks), the model is able to select an optimal action (a) in each prediction based on the current state(s) and update the Q value based on the rewards (r) obtained. The specific updating formula is as follows: Wherein alpha is learning rate, gamma is discount factor, The next state.
This dynamic update mechanism enables the model to gradually improve its decision strategy, thereby improving the accuracy of the encoding.
In addition, the introduction of meta-learning strategies enables models to quickly adapt to new disease encoding tasks. By training on multiple related tasks, the model can learn shared knowledge, thereby speeding up the response to new conditions.
Upon encountering a new task, the model can be quickly adjusted using previously learned knowledge without having to train from scratch. Even in the case of data scarcity, the model can be effectively learned and inferred through the existing experience.
During model training, periodic performance assessment is critical. Through the cross-validation and leave-out method and other technologies, the generalization capability of the model can be ensured, and overfitting is avoided.
Cross-validation-the data set is divided into subsets and the performance of the model is evaluated by training and testing with different subsets in turn.
And (3) super-parameter adjustment, namely, adjusting super-parameters (such as learning rate, hidden layer number and the like) of the model in time according to the evaluation result so as to optimize the overall performance of the model.
By the above method, the application of reinforcement learning framework in disease coding brings various benefits:
The coding accuracy is improved, the coding error can be effectively reduced in the dynamic optimization decision process, and the quality and reliability of the medical document are improved.
The model adaptability is enhanced, the meta learning strategy enables the model to be quickly adapted to a new coding task, and dependence on a large amount of annotation data is reduced.
The method has stronger generalization capability, namely, the periodic performance evaluation and the super-parameter adjustment ensure that the model can keep good performance under different conditions, and is beneficial to implementation in a real clinical environment.
The accurate disease coding is not only beneficial to patient management, but also provides reliable data support for clinical decision, and improves medical efficiency.
In a word, by combining the reinforcement learning framework and the meta learning strategy, the disease coding process can be obviously optimized, and a more efficient and more accurate solution is brought to the medical industry.
In this embodiment, during disease encoding, the model outputs a series of predicted probability values that represent the degree of matching each possible international disease classification (ICD) code to the input features. In this way, the system is able to provide an explicit coding choice for each input, enhancing its interpretability and accuracy.
The model first generates an un-normalized score from the input features and then performs a normalization process using a softmax function. The softmax function converts these scores into probability values, which are formulated as follows: wherein, the method comprises the steps of, wherein, For the un-normalized score of the corresponding code, N is the number of all possible codes. In this way, the probability values output can be easily understood and interpreted, enabling medical personnel to more intuitively evaluate the likelihood of each code.
To cope with uncertainty in the predicted outcome, the system employs a fuzzy logic system. The system compares the predicted probability value to a pre-set threshold based on dynamic threshold settings to determine a final coding selection.
Dynamic threshold setting-the threshold is not fixed but is continuously adjusted based on historical data and real-time feedback. Thus, the system can better adapt to the characteristics and requirements of different cases, and can still make proper decisions when the uncertainty is high.
The specific optimization process comprises the following steps:
by analyzing the historical case data, it is identified which feature combinations are relevant to successful encoding results. This process helps to understand which factors have the greatest impact on the encoding results.
And collecting feedback information of the coding result in real time, and continuously updating and adjusting the dynamic threshold value by utilizing the data. This feedback mechanism ensures that the system can learn and improve its decision making process.
And dynamically adjusting a threshold value for judging the coding selection according to the analysis result and the feedback information so as to improve the accuracy of decision.
And selecting ICD codes meeting the requirements from the predicted probability values as final disease code output by the system according to the adjusted dynamic threshold.
This approach offers several benefits in the disease encoding process:
accuracy is improved, namely, through normalization of probability values and introduction of fuzzy logic, the system can make more accurate coding selection when facing complex and uncertain conditions.
The interpretability is enhanced, the medical personnel can understand the selection basis of each code based on the probability output, so that the trust level is improved.
The adaptability is strong, the dynamic threshold value is set to allow the system to be flexibly adjusted according to the characteristics of different cases, and the adaptability of the system is enhanced.
And (3) continuous optimization, namely, through a feedback mechanism and historical data analysis, the system can continuously optimize the decision process and improve the overall performance.
The clinical decision is supported, the finally generated disease code is used for subsequent clinical decision support and data analysis, the accurate reflection of the disease condition of the patient is ensured, and a good data basis is provided for medical treatment.
In a word, through combining probability output, a fuzzy logic system and dynamic threshold adjustment, the disease coding process becomes more intelligent and efficient, and the coding quality and reliability are effectively improved.
And S160, outputting the ICD coding result.
In this embodiment, the final ICD encoding results, including data storage structures and clinical support functions, are managed and stored.
According to the intelligent encoding method for the infectious disease data, the medical text data related to the infectious disease is obtained, the local features are extracted, the clinical medical knowledge graph is constructed, the encoding relation is modeled, the local features and the modeling result are fused, the fusion result is input into the encoding decision model, the ICD encoding result is generated and output, and the feature extraction and encoding prediction process of the infectious disease data is improved, so that the encoding accuracy and efficiency are improved.
Fig. 7 is a schematic block diagram of an intelligent encoding device 300 for infectious disease data according to an embodiment of the present invention. As shown in fig. 7, the present invention also provides an intelligent encoding apparatus 300 for infectious disease data, corresponding to the above method for intelligent encoding infectious disease data. The infectious disease data smart encoding apparatus 300 includes a unit for performing the above infectious disease data smart encoding method, and the apparatus may be configured in a server. Specifically, referring to fig. 7, the intelligent encoding apparatus 300 for infectious disease data includes a data acquisition unit 301, a local feature extraction unit 302, a modeling unit 303, a fusion unit 304, an encoding unit 305, and an output unit 306.
The device comprises a data acquisition unit 301 for acquiring medical text data related to infectious diseases, a local feature extraction unit 302 for extracting local features from the medical text data, a modeling unit 303 for constructing a clinical medical knowledge graph and modeling a coding relationship to obtain a modeling result, a fusion unit 304 for fusing the local features and the modeling result to obtain a feature fusion result, a coding unit 305 for inputting the feature fusion result into a coding decision model for coding to obtain an ICD coding result, and an output unit 306 for outputting the ICD coding result.
In an embodiment, as shown in fig. 8, the local feature extraction unit 302 includes an embedding subunit 3021, a convolution subunit 3022, a max-pooling subunit 3023, and a splice full-connection subunit 3024.
An embedding subunit 3021 for performing text decomposition and embedding on the medical text data to obtain an embedding matrix, a convolution subunit 3022 for performing convolution operation on the embedding matrix to obtain a convolution result, a maximum pooling subunit 3023 for performing maximum pooling layer processing on the convolution result to obtain a pooling result, and a splicing full-connection subunit 3024 for performing feature map splicing and full-connection processing on the pooling result to obtain a local feature.
In an embodiment, the modeling unit 303 is configured to construct a clinical knowledge graph, and model the coding relationship by using a graph neural network to obtain a modeling result.
In one embodiment, as shown in fig. 9, the modeling unit 303 includes a graph construction subunit 3031, a multi-layer graph convolution subunit 3032, a pooling operation subunit 3033, a similarity calculation subunit 3034, and a combination subunit 3035.
The system comprises a map construction subunit 3031 for constructing a clinical medical knowledge map, a multi-layer map convolution subunit 3032 for carrying out multi-layer map convolution processing on the clinical medical knowledge map to obtain a node embedding representation, a pooling operation subunit 3033 for extracting global features from the node embedding representation by using pooling operation, a similarity calculation subunit 3034 for calculating the similarity between nodes according to the global feature representation to determine the dependency relationship between nodes, and a combination subunit 3035 for combining the node embedding representation and the dependency relationship between nodes to obtain a modeling result.
In one embodiment, as shown in fig. 10, the multi-layer graph convolution subunit 3032 includes a weighted sum module 30321 and a transform module 30322.
The system comprises a weighting and summing module 30321 for converging the information of the neighbor nodes in the clinical knowledge graph and calculating the characteristic weighting and summing of each node, and a transformation module 30322 for applying the weighting matrix and the bias term of the graph convolution layer and performing nonlinear transformation by using a ReLU activation function to obtain the node embedded representation.
In one embodiment, as shown in fig. 11, the fusion unit 304 includes a splicing subunit 3041, a dimension-reducing subunit 3042, and a feature fusion subunit 3043.
The feature fusion device comprises a modeling unit 3041, a feature fusion unit 3043 and a feature fusion unit, wherein the modeling unit is used for modeling the local feature and the feature fusion unit, the feature fusion unit 3043 is used for integrating the local feature and the feature fusion unit, and the feature fusion unit is used for obtaining a feature fusion result.
In an embodiment, the encoding unit 305 is configured to input the feature fusion result into an encoding decision model for encoding, and determine the ICD encoding result by using fuzzy logic, where the encoding decision model is a multi-level neural network model constructed by using a reinforcement learning framework.
It should be noted that, as will be clearly understood by those skilled in the art, the specific implementation process of the above-mentioned intelligent encoding apparatus 300 for infectious disease data and each unit may refer to the corresponding descriptions in the foregoing method embodiments, and for convenience and brevity of description, the detailed description is omitted herein.
The above-described infectious disease data smart encoding apparatus 300 may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 12.
Referring to fig. 12, fig. 12 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a server, where the server may be a stand-alone server or may be a server cluster formed by a plurality of servers.
With reference to FIG. 12, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032 includes program instructions that, when executed, cause the processor 502 to perform a method of intelligent encoding of infectious disease data.
The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.
The internal memory 504 provides an environment for the execution of a computer program 5032 in the non-volatile storage medium 503, which computer program 5032, when executed by the processor 502, causes the processor 502 to perform a method for intelligently encoding infectious disease data.
The network interface 505 is used for network communication with other devices. It will be appreciated by those skilled in the art that the structure shown in FIG. 12 is merely a block diagram of some of the structures associated with the present inventive arrangements and does not constitute a limitation of the computer device 500 to which the present inventive arrangements may be applied, and that a particular computer device 500 may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor 502 is configured to execute a computer program 5032 stored in a memory to implement the steps of:
Obtaining medical text data related to infectious diseases, extracting local features from the medical text data, constructing a clinical medical knowledge graph, modeling a coding relationship to obtain a modeling result, fusing the local features and the modeling result to obtain a feature fusion result, inputting the feature fusion result into a coding decision model for coding to obtain an ICD coding result, and outputting the ICD coding result.
In one embodiment, when the step of extracting the local feature from the medical text data is implemented by the processor 502, the following steps are specifically implemented:
The medical text data is subjected to text decomposition and embedding to obtain an embedded matrix, convolution operation is carried out on the embedded matrix to obtain a convolution result, maximum pooling layer processing is carried out on the convolution result to obtain a pooling result, and feature map splicing and full connection processing are carried out on the pooling result to obtain local features.
In one embodiment, when the processor 502 performs the step of constructing the clinical knowledge graph and modeling the coding relationship to obtain the modeling result, the following steps are specifically implemented:
Constructing a clinical medical knowledge graph, and modeling the coding relationship by adopting a graph neural network to obtain a modeling result.
In an embodiment, when the processor 502 implements the step of constructing the clinical knowledge graph and modeling the coding relationship by using the graph neural network to obtain a modeling result, the following steps are specifically implemented:
constructing a clinical medical knowledge graph, carrying out multi-layer graph convolution processing on the clinical medical knowledge graph to obtain node embedding representation, extracting global features from the node embedding representation by using pooling operation, calculating similarity between nodes according to the global feature representation to determine dependency relationship between nodes, and combining the node embedding representation and the dependency relationship between nodes to obtain a modeling result.
In one embodiment, when the processor 502 performs the multi-layer graph convolution processing on the clinical knowledge graph to obtain the node embedded representation step, the following steps are specifically implemented:
And applying a weight matrix and a bias term of a graph convolution layer, and performing nonlinear transformation by using a ReLU activation function to obtain node embedding representation.
In an embodiment, when the step of fusing the local feature with the modeling result to obtain a feature fusion result is implemented by the processor 502, the following steps are specifically implemented:
the local features and the modeling result are spliced to obtain a high-dimensional feature vector, the high-dimensional feature vector is subjected to dimension reduction processing to obtain a dimension reduction result, and the dimension reduction result is input into an ensemble learning model to be subjected to feature fusion to obtain a feature fusion result.
In an embodiment, when the step of inputting the feature fusion result into the coding decision model to perform coding to obtain the ICD coding result is implemented by the processor 502, the following steps are specifically implemented:
Inputting the characteristic fusion result into a coding decision model for coding, and determining an ICD coding result by adopting fuzzy logic;
The coding decision model is a multi-level neural network model constructed by using a reinforcement learning framework.
It should be appreciated that in embodiments of the present application, the Processor 502 may be a central processing unit (Central Processing Unit, CPU), the Processor 502 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application SPECIFIC INTEGRATED Circuits (ASICs), off-the-shelf Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Those skilled in the art will appreciate that all or part of the flow in a method embodying the above described embodiments may be accomplished by computer programs instructing the relevant hardware. The computer program comprises program instructions, and the computer program can be stored in a storage medium, which is a computer readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.
Accordingly, the present invention also provides a storage medium. The storage medium may be a computer readable storage medium. The storage medium stores a computer program which, when executed by a processor, causes the processor to perform the steps of:
Obtaining medical text data related to infectious diseases, extracting local features from the medical text data, constructing a clinical medical knowledge graph, modeling a coding relationship to obtain a modeling result, fusing the local features and the modeling result to obtain a feature fusion result, inputting the feature fusion result into a coding decision model for coding to obtain an ICD coding result, and outputting the ICD coding result.
In one embodiment, the processor, when executing the computer program to perform the step of extracting local features from the medical text data, performs the steps of:
The medical text data is subjected to text decomposition and embedding to obtain an embedded matrix, convolution operation is carried out on the embedded matrix to obtain a convolution result, maximum pooling layer processing is carried out on the convolution result to obtain a pooling result, and feature map splicing and full connection processing are carried out on the pooling result to obtain local features.
In one embodiment, when the processor executes the computer program to implement the step of constructing a clinical medical knowledge graph and modeling the coding relationship to obtain a modeling result, the processor specifically implements the following steps:
Constructing a clinical medical knowledge graph, and modeling the coding relationship by adopting a graph neural network to obtain a modeling result.
In one embodiment, when the processor executes the computer program to implement the step of constructing the clinical medical knowledge graph and adopting the graph neural network to model the coding relationship so as to obtain a modeling result, the processor specifically implements the following steps:
constructing a clinical medical knowledge graph, carrying out multi-layer graph convolution processing on the clinical medical knowledge graph to obtain node embedding representation, extracting global features from the node embedding representation by using pooling operation, calculating similarity between nodes according to the global feature representation to determine dependency relationship between nodes, and combining the node embedding representation and the dependency relationship between nodes to obtain a modeling result.
In one embodiment, when the processor executes the computer program to implement the multi-layer graph convolution processing on the clinical medical knowledge graph to obtain a node embedded representation step, the method specifically includes the following steps:
And applying a weight matrix and a bias term of a graph convolution layer, and performing nonlinear transformation by using a ReLU activation function to obtain node embedding representation.
In one embodiment, when the processor executes the computer program to realize the step of fusing the local feature and the modeling result to obtain a feature fusion result, the processor specifically realizes the following steps:
the local features and the modeling result are spliced to obtain a high-dimensional feature vector, the high-dimensional feature vector is subjected to dimension reduction processing to obtain a dimension reduction result, and the dimension reduction result is input into an ensemble learning model to be subjected to feature fusion to obtain a feature fusion result.
In one embodiment, when the processor executes the computer program to implement the step of inputting the feature fusion result into the coding decision model for coding to obtain the ICD coding result, the following steps are specifically implemented:
Inputting the characteristic fusion result into a coding decision model for coding, and determining an ICD coding result by adopting fuzzy logic;
The coding decision model is a multi-level neural network model constructed by using a reinforcement learning framework.
The storage medium may be a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, or other various computer-readable storage media that can store program codes.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed.
The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be combined, divided and deleted according to actual needs. In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The integrated unit may be stored in a storage medium if implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a terminal, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (10)

1.感染性疾病数据智能编码方法,其特征在于,包括:1. An intelligent coding method for infectious disease data, characterized by comprising: 获取与感染性疾病相关的医疗文本数据;Obtain medical text data related to infectious diseases; 对所述医疗文本数据提取局部特征;Extracting local features from the medical text data; 构建临床医学知识图谱,对编码关系进行建模,以得到建模结果;Construct a clinical medical knowledge graph and model the coding relationship to obtain modeling results; 对所述局部特征与建模结果进行融合,以得到特征融合结果;The local features are fused with the modeling results to obtain a feature fusion result; 将所述特征融合结果输入至编码决策模型中进行编码,以得到ICD编码结果;Inputting the feature fusion result into the coding decision model for coding to obtain an ICD coding result; 输出所述ICD编码结果。Output the ICD encoding result. 2.根据权利要求1所述的感染性疾病数据智能编码方法,其特征在于,所述对所述医疗文本数据提取局部特征,包括:2. The infectious disease data intelligent encoding method according to claim 1, characterized in that the extracting of local features from the medical text data comprises: 对所述医疗文本数据进行文本分解和嵌入,以得到嵌入矩阵;Performing text decomposition and embedding on the medical text data to obtain an embedding matrix; 对所述嵌入矩阵进行卷积操作,以得到卷积结果;Performing a convolution operation on the embedding matrix to obtain a convolution result; 对所述卷积结果进行最大池化层处理,以得到池化结果;Performing maximum pooling layer processing on the convolution result to obtain a pooling result; 对所述池化结果进行特征图拼接以及全连接处理,以得到局部特征。The pooling results are subjected to feature map splicing and full connection processing to obtain local features. 3.根据权利要求1所述的感染性疾病数据智能编码方法,其特征在于,所述构建临床医学知识图谱,对编码关系进行建模,以得到建模结果,包括:3. The method for intelligent encoding of infectious disease data according to claim 1, characterized in that the step of constructing a clinical medical knowledge graph and modeling the encoding relationship to obtain a modeling result comprises: 构建临床医学知识图谱,采用图神经网络对编码关系进行建模,以得到建模结果。Construct a clinical medical knowledge graph and use graph neural network to model the encoding relationship to obtain the modeling results. 4.根据权利要求3所述的感染性疾病数据智能编码方法,其特征在于,所述构建临床医学知识图谱,采用图神经网络对编码关系进行建模,以得到建模结果,包括:4. The method for intelligent encoding of infectious disease data according to claim 3, characterized in that the construction of a clinical medical knowledge graph uses a graph neural network to model the encoding relationship to obtain a modeling result, including: 构建临床医学知识图谱;Construct clinical medical knowledge graph; 对所述临床医学知识图谱进行多层图卷积处理,以得到节点嵌入表示;Performing multi-layer graph convolution processing on the clinical medical knowledge graph to obtain a node embedding representation; 对所述节点嵌入表示使用池化操作提取全局特征;Extracting global features using a pooling operation on the node embedding representation; 根据所述全局特征表示计算节点之间的相似度,以确定节点间的依赖关系;Calculate the similarity between nodes according to the global feature representation to determine the dependency relationship between the nodes; 组合所述节点嵌入表示以及节点间的依赖关系,以得到建模结果。The node embedding representation and the dependencies between nodes are combined to obtain a modeling result. 5.根据权利要求1所述的感染性疾病数据智能编码方法,其特征在于,所述对所述临床医学知识图谱进行多层图卷积处理,以得到节点嵌入表示,包括:5. The method for intelligent encoding of infectious disease data according to claim 1, characterized in that the step of performing multi-layer graph convolution processing on the clinical medical knowledge graph to obtain a node embedding representation comprises: 对所述临床医学知识图谱中邻居节点的信息聚合,计算每个节点的特征加权和;Aggregating information of neighbor nodes in the clinical medical knowledge graph and calculating a weighted sum of features of each node; 应用图卷积层的权重矩阵和偏置项,并使用ReLU激活函数进行非线性变换,以得到节点嵌入表示。The weight matrix and bias term of the graph convolution layer are applied, and the ReLU activation function is used for nonlinear transformation to obtain the node embedding representation. 6.根据权利要求1所述的感染性疾病数据智能编码方法,其特征在于,所述对所述局部特征与建模结果进行融合,以得到特征融合结果,包括:6. The method for intelligent encoding of infectious disease data according to claim 1, characterized in that the fusion of the local features with the modeling results to obtain the feature fusion results comprises: 将所述局部特征与所述建模结果进行拼接,以得到高维特征向量;Concatenating the local features with the modeling results to obtain a high-dimensional feature vector; 对所述高维特征向量进行降维处理,以得到降维结果;Performing dimensionality reduction processing on the high-dimensional feature vector to obtain a dimensionality reduction result; 将所述降维结果输入至集成学习模型中进行特征融合,以得到特征融合结果。The dimension reduction result is input into the integrated learning model for feature fusion to obtain a feature fusion result. 7.根据权利要求1所述的感染性疾病数据智能编码方法,其特征在于,所述将所述特征融合结果输入至编码决策模型中进行编码,以得到ICD编码结果,包括:7. The method for intelligently encoding infectious disease data according to claim 1, characterized in that the step of inputting the feature fusion result into a coding decision model for encoding to obtain an ICD coding result comprises: 将所述特征融合结果输入至编码决策模型中进行编码,并采用模糊逻辑确定ICD编码结果;The feature fusion result is input into the coding decision model for coding, and the ICD coding result is determined by using fuzzy logic; 其中,所述编码决策模型是利用强化学习框架构建的多层次神经网络模型。Among them, the coding decision model is a multi-level neural network model constructed using a reinforcement learning framework. 8.感染性疾病数据智能编码装置,其特征在于,包括:8. An intelligent encoding device for infectious disease data, characterized in that it comprises: 数据获取单元,用于获取与感染性疾病相关的医疗文本数据;A data acquisition unit, used for acquiring medical text data related to infectious diseases; 局部特征提取单元,用于对所述医疗文本数据提取局部特征;A local feature extraction unit, used for extracting local features from the medical text data; 建模单元,用于构建临床医学知识图谱,对编码关系进行建模,以得到建模结果;A modeling unit is used to construct a clinical medical knowledge graph and model the coding relationship to obtain a modeling result; 融合单元,用于对所述局部特征与建模结果进行融合,以得到特征融合结果;A fusion unit, used for fusing the local features with the modeling results to obtain a feature fusion result; 编码单元,用于将所述特征融合结果输入至编码决策模型中进行编码,以得到ICD编码结果;An encoding unit, used for inputting the feature fusion result into an encoding decision model for encoding to obtain an ICD encoding result; 输出单元,用于输出所述ICD编码结果。An output unit is used to output the ICD coding result. 9.一种计算机设备,其特征在于,所述计算机设备包括存储器及处理器,所述存储器上存储有计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至7中任一项所述的方法。9. A computer device, characterized in that the computer device comprises a memory and a processor, the memory stores a computer program, and the processor implements the method according to any one of claims 1 to 7 when executing the computer program. 10.一种存储介质,其特征在于,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的方法。10. A storage medium, characterized in that the storage medium stores a computer program, and when the computer program is executed by a processor, the method according to any one of claims 1 to 7 is implemented.
CN202411656948.2A 2024-11-19 2024-11-19 Intelligent encoding method and device for infectious disease data, computer equipment and storage medium Pending CN119153127A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411656948.2A CN119153127A (en) 2024-11-19 2024-11-19 Intelligent encoding method and device for infectious disease data, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411656948.2A CN119153127A (en) 2024-11-19 2024-11-19 Intelligent encoding method and device for infectious disease data, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN119153127A true CN119153127A (en) 2024-12-17

Family

ID=93814151

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411656948.2A Pending CN119153127A (en) 2024-11-19 2024-11-19 Intelligent encoding method and device for infectious disease data, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN119153127A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111382272A (en) * 2020-03-09 2020-07-07 西南交通大学 An automatic coding method of electronic medical record ICD based on knowledge graph
CN111402974A (en) * 2020-03-06 2020-07-10 西南交通大学 An automatic coding method of electronic medical record ICD based on deep learning
CN111540468A (en) * 2020-04-21 2020-08-14 重庆大学 ICD automatic coding method and system for visualization of diagnosis reason
CN111640517A (en) * 2020-05-27 2020-09-08 医渡云(北京)技术有限公司 Medical record encoding method and device, storage medium and electronic equipment
CN113326384A (en) * 2021-06-22 2021-08-31 四川大学 Construction method of interpretable recommendation model based on knowledge graph
US20210319280A1 (en) * 2020-04-07 2021-10-14 NEC Laboratories Europe GmbH Interpretable node embedding
CN115248842A (en) * 2022-06-20 2022-10-28 北京雅丁信息技术有限公司 ICD intelligent coding system based on knowledge graph and retrieval engine
WO2023065858A1 (en) * 2021-10-19 2023-04-27 之江实验室 Medical term standardization system and method based on heterogeneous graph neural network
CN116364220A (en) * 2023-01-13 2023-06-30 重庆大学 Automatic ICD coding method and system based on disease relation enhancement
CN116524248A (en) * 2023-04-17 2023-08-01 首都医科大学附属北京友谊医院 Medical data processing device, method and classification model training device
CN116705221A (en) * 2023-06-15 2023-09-05 北京理工大学 An ICD Coding Prediction Method Based on Knowledge Subgraph Fusion
CN117594246A (en) * 2023-11-21 2024-02-23 北京工业大学 An automatic ICD encoding method and system for medical text based on graph attention
CN117807956A (en) * 2023-12-29 2024-04-02 兰州理工大学 ICD automatic coding method based on clinical text tree structure

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111402974A (en) * 2020-03-06 2020-07-10 西南交通大学 An automatic coding method of electronic medical record ICD based on deep learning
CN111382272A (en) * 2020-03-09 2020-07-07 西南交通大学 An automatic coding method of electronic medical record ICD based on knowledge graph
US20210319280A1 (en) * 2020-04-07 2021-10-14 NEC Laboratories Europe GmbH Interpretable node embedding
CN111540468A (en) * 2020-04-21 2020-08-14 重庆大学 ICD automatic coding method and system for visualization of diagnosis reason
CN111640517A (en) * 2020-05-27 2020-09-08 医渡云(北京)技术有限公司 Medical record encoding method and device, storage medium and electronic equipment
CN113326384A (en) * 2021-06-22 2021-08-31 四川大学 Construction method of interpretable recommendation model based on knowledge graph
WO2023065858A1 (en) * 2021-10-19 2023-04-27 之江实验室 Medical term standardization system and method based on heterogeneous graph neural network
CN115248842A (en) * 2022-06-20 2022-10-28 北京雅丁信息技术有限公司 ICD intelligent coding system based on knowledge graph and retrieval engine
CN116364220A (en) * 2023-01-13 2023-06-30 重庆大学 Automatic ICD coding method and system based on disease relation enhancement
CN116524248A (en) * 2023-04-17 2023-08-01 首都医科大学附属北京友谊医院 Medical data processing device, method and classification model training device
CN116705221A (en) * 2023-06-15 2023-09-05 北京理工大学 An ICD Coding Prediction Method Based on Knowledge Subgraph Fusion
CN117594246A (en) * 2023-11-21 2024-02-23 北京工业大学 An automatic ICD encoding method and system for medical text based on graph attention
CN117807956A (en) * 2023-12-29 2024-04-02 兰州理工大学 ICD automatic coding method based on clinical text tree structure

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
王贝伦: "《机器学习》", 30 November 2021, 东南大学出版社, pages: 222 *
王鸿伟: "《基于网络特征学习的个性化推荐系统》", 31 May 2022, 机械工业出版社, pages: 191 - 192 *
高敬鹏: "《深度学习:卷积神经网络技术与实践》", 31 July 2020, 机械工业出版社, pages: 214 - 215 *

Similar Documents

Publication Publication Date Title
US11790171B2 (en) Computer-implemented natural language understanding of medical reports
CN118538399B (en) An intelligent pediatric disease diagnosis auxiliary system
CN109920501B (en) Electronic medical record classification method and system based on convolutional neural network and active learning
CN112149414B (en) Text similarity determination method, device, equipment and storage medium
CN119495423B (en) Medical diagnosis intelligent decision system based on multi-mode data fusion
KR102153920B1 (en) System and method for interpreting medical images through the generation of refined artificial intelligence reinforcement learning data
CN114550946B (en) Medical data processing method, device and storage medium
CN116719840B (en) Medical information pushing method based on post-medical-record structured processing
CN119008013B (en) Viral pneumonia risk assessment prediction system
US12165774B1 (en) Method and apparatus for predicting pulsed field ablation durability
CN119314693B (en) Intelligent management platform and method for the entire course of tumor disease
Rashidi et al. Introduction to artificial intelligence and machine learning in pathology and medicine: generative and nongenerative artificial intelligence basics
CN119724463B (en) An electronic medical record management method and system based on AI computing
CN118016224A (en) Intelligent case analysis auxiliary system based on medical big data platform
US12266107B1 (en) System and method for responding to a user input using an agent orchestrator
EP4503051A1 (en) Apparatus and method for determining a patient survival profile using artificial intelligence-enabled electrocardiogram (ecg)
CN118132736B (en) Training method, control device and storage medium for user portrait identification system
CN118571495B (en) Inner ear disease diagnosis model construction method, system, medium, product and terminal
US20230317279A1 (en) Method and system for medical diagnosis using graph embeddings
CN119153127A (en) Intelligent encoding method and device for infectious disease data, computer equipment and storage medium
Subramaniam et al. RetNet30: A Novel Stacked Convolution Neural Network Model for Automated Retinal Disease Diagnosis
CN114429822A (en) Medical record quality inspection method and device and storage medium
Yang et al. Clinical decision-making framework against over-testing based on modeling implicit evaluation criteria
CN120183700B (en) Infertility analysis and diagnosis auxiliary method based on artificial intelligence
US12367953B1 (en) Apparatus and a method for the generation of a medical report

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination