CN117408255A - A dual-graph neural network medical named entity recognition method based on multi-feature fusion - Google Patents
A dual-graph neural network medical named entity recognition method based on multi-feature fusion Download PDFInfo
- Publication number
- CN117408255A CN117408255A CN202311317519.8A CN202311317519A CN117408255A CN 117408255 A CN117408255 A CN 117408255A CN 202311317519 A CN202311317519 A CN 202311317519A CN 117408255 A CN117408255 A CN 117408255A
- Authority
- CN
- China
- Prior art keywords
- representing
- word
- graph
- vector
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/14—Tree-structured documents
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Pathology (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a double-graph neural network medical named entity recognition method based on multi-feature fusion, which comprises the steps of preprocessing text sequences; on one hand, the context characteristics are obtained by inputting the context characteristics into a two-way long-short-term memory network; on the other hand, respectively constructing a word co-occurrence diagram and a dependency syntax diagram; the graph convolution neural network learns the dependence and interaction between nodes according to the node relation of the graph structure to obtain global characteristic information; carrying out feature fusion on the context feature information and the global feature information, and calculating the dependency relationship inside the features by adopting a multi-head self-attention mechanism to obtain comprehensive feature information; and calculating comprehensive characteristic information by using a CRF model, and performing sequence labeling to obtain an optimal labeling sequence. Compared with the prior art, the invention extracts text features from the perspective of double graphs by constructing the word co-occurrence graph and the dependency syntax graph, and introduces a multi-head self-attention mechanism to calculate the dependency relationship inside the features, thereby effectively improving the accuracy of identifying the medical named entity.
Description
Technical Field
The invention relates to the technical field of natural language processing, in particular to a double-graph neural network medical named entity identification method based on multi-feature fusion.
Background
Named entity recognition (Named Entity Recognition, NER) in the medical field is an important task in natural language processing in the medical field, aiming at identifying specific entities from text, such as disease names, drug names, surgical procedures, diseases and diagnostics, etc. With the continuous development of medical research and medical practice, a great deal of Chinese electronic medical record text data is generated in the clinical treatment process, and the data contains rich medical information. The key information of related entities is automatically extracted from the massive texts through the medical naming entity recognition technology, and the information can be used for personalized medical service system construction and clinical auxiliary decision support, and has important significance for professional research in the medical field.
The entities in the medical field have diversity and complexity compared to the task of named entity identification in the general field, the same drug name may have multiple aliases and variant names, the same disease may have different naming schemes, and the disease name is often more complex. Meanwhile, complex relationships exist between entities in the electronic medical record, such as relationships between doctors and patients, relationships between diagnosis and treatment, and the like, which make the task of identifying named entities in the medical field more difficult.
Disclosure of Invention
The invention aims to: aiming at the technical problems in the background technology, the invention provides a dual-graph neural network medical named entity recognition method based on multi-feature fusion, which is characterized in that context feature information in a sequence is extracted through a two-way long-short-term memory network, global feature information of the sequence is obtained through a graph convolution neural network, the context information and the global information are subjected to feature fusion, and the dependency relationship inside the features is calculated through a self-attention mechanism, so that the accuracy of medical named entity recognition is improved.
The technical scheme is as follows: the invention provides a double-graph neural network medical entity identification method based on multi-feature fusion, which comprises the following steps:
step 1: preprocessing data of the text sequence, and dividing a training set and a testing set;
step 2: inputting text sequences into a two-way long and short term memory network module, enhancing the association of context Wen Yuyi and obtaining hidden layer feature vector H containing contextual information t ;
Step 3: respectively constructing a word co-occurrence diagram and a dependency syntax diagram according to the text sequence;
step 4: learning the dependency relationship and interaction between nodes according to the node relationship of the graph structure by using the graph convolution neural network to obtain global feature information H of fusion word co-occurrence node feature information and dependency syntax feature information C ;
Step 5: carrying out feature fusion on the context feature information obtained in the step 2 and the global feature information obtained in the step 4, and calculating the dependency relationship inside the feature by adopting a multi-head self-attention mechanism to obtain comprehensive feature information;
step 6: and (5) calculating the comprehensive characteristic information obtained in the step (5) by using a CRF model, and performing sequence labeling to obtain an optimal labeling sequence.
Further, the specific method of the step 1 is as follows:
step 1.1: performing text word segmentation, text normalization, text cleaning and removal of stop words and low-frequency words on the text sequence;
and step 1.2, performing a shuffle operation on the data set, and dividing the data set into a training set and a testing set according to the proportion of 7:3.
Further, the specific method of the step 2 is as follows:
step 2.1, enhancing the association of the upper and lower Wen Yuyi by utilizing a two-way long-short-term memory network, and acquiring a hidden layer feature vector containing context information;
step 2.1, using l= (L 1 ,l 2 ,…,l i …,l n ) Representing the text sequence after the text preprocessing of step 1, and using a pre-trained word vector model GloVe for the text sequence l i Representing to obtain Wherein l i ∈R d N represents the length of the sentence and d represents the word vector dimension;
step 2.2, the two-way long-short-term memory network obtains the text sequence l through the forward and reverse LSTM networks respectively i Context dependent information in front and back directions is calculated to obtain a forward vector and a backward vector, and the two vectors are spliced and used as output of a hidden layer and expressed as:
f t =σ(W f ·[h t-1 ,x t ]+b f )
i t =σ(W i ·[h t-1 ,x t ]+b i )
o t =σ(W o ·[h t-1 ,x t ]+b o
h t =o t *tanh(C t )
wherein f t 、i t And o t Respectively representing a forgetting gate, an input gate and an output gate at the time t,representing t-moment candidate memory cell vector, C t A memory cell vector h representing time t t Representing hidden layer output vector, representing dot product, W representing weight matrix, b representing bias vector, σ (·) and tanh (·) representing sigmoid activation function and hyperbolic tangent activation function, respectively;
by outputting the forward LSTM of a bidirectional long and short term memory networkAnd reverse LSTM output->Splicing to obtain final hidden layer characteristics +.>
Further, the specific process of the step 3 is as follows:
step 3.1, constructing a word co-occurrence diagram;
step 3.1.1, the text sequence of length n is represented by w= { W 1 ,w 2 ,…,w i ,…,x n Represented by }, w i Representing an ith word in the text sequence, and setting a designated sliding window size;
step 3.1.2 sliding window along text sequence from left to right, window center word is w i If w j And w i Within a window, w is constructed i And w j Connecting edges between word nodes;
step 3.1.3, constructing a word co-occurrence graph adjacency matrix A m Expressed as:wherein the method comprises the steps of ci j represents the number of times two words co-occur within each window;
step 3.2, constructing a dependency syntax diagram;
step 3.2.1, analyzing the text sequence by using a dependency syntax analyzer Stanford NLP to obtain a syntax dependency tree of the sentence;
step 3.2.2, constructing a graph structure G= (V, E) by taking words in the syntactic dependency tree as nodes and taking dependency relationship as directed edges, wherein each edge E (i, j) E represents a head node x i To a slave node x j Is dependent on (a); wherein V represents the nodes of the graph, E represents the nodes and the edges of the nodes;
step 3.2.3 constructing an adjacency matrix A from the syntactic dependency graph s Expressed as:
further, the specific process of the step 4 is as follows:
step 4.1, aggregating and updating the nodes in the word co-occurrence graph and the dependency syntax graph according to the nodes, meanwhile, the graph convolution operation considers the neighbor node characteristics of the nodes, updates the characteristic representation of the nodes by utilizing the information of the neighbor nodes, and aims at the first-layer word nodeAnd syntax node->Layer 1 word node->And syntax node->The update mode is defined as follows:
wherein W is l Representing model training weight parameters, RELU (·) representing activation functions,representing a normalized symmetric adjacency matrix expressed as:Wherein D represents a degree matrix;
step 4.2, splicing the updated word node vector representation and the dependency syntax node vector representation to obtain global feature information H C Expressed as:
further, the specific process of the step 5 is as follows:
step 5.1, performing feature fusion on the context feature information obtained in the step 2 and the global feature information obtained in the step 4 to obtain a fusion vector, and marking the fusion vector as R= [ H ] t ,H C ];
Step 5.2, performing linear transformation on the fusion vector R to obtain 3 different vector matrixes Q (query), K (Key) and V (Value), wherein q=k=v=r; and (3) carrying out dot product operation on each keyword matrix K by using Q, scaling an operation result, and normalizing by using a softmax function to obtain attention weight, wherein a calculation formula is expressed as follows:
wherein,representing the dimension of the key, softmax (·) represents probability mapping of the input element;
and 5.3, performing multiple parallel attention calculations on the high-dimensional feature vectors through a multi-head self-attention mechanism, wherein the number of heads is e, each attention head has different feature parameters, after attention information of one head is obtained through calculation, performing linear transformation on the e heads after splicing to finally obtain comprehensive feature information of fused attention information, and the comprehensive feature information is expressed as:
M i =SelfAttention(QW Q ,KW K ,VW V )
MulAtt(Q,K,V)=Concat(M 1 ,M 2 ,…,M e )W
wherein W is Q ,W K ,W V The weight matrix learned by the model in training is represented, concat (·) represents the feature vector extracted by the e attention heads is spliced, and W represents the weight matrix generated in the implementation process.
Further, the specific process of the step 6 is as follows:
for a given sequence x= { x 1 ,x 2 ,…,x n Sum prediction sequence y= { 1 ,y 2 ,…,y n -its calculated score is expressed as:
wherein,representing y i Transfer to y i+1 Probability of->Indicating that the i-th word is marked y i S (x, y) represents the probability that the input sentence sequence x is labeled with the label sequence y.
The beneficial effects are that:
the invention acquires topological structure information and syntax dependency information among words by constructing the word co-occurrence graph and the dependency syntax graph, and solves the problem that the word ambiguity of a named entity and the dependency information among words cannot be fully utilized. The context characteristic information is captured by utilizing the two-way long-short-term memory network, the double-graph updating learning is performed through the graph convolution neural network, and a multi-head self-attention mechanism is introduced to perform characteristic fusion, so that the identification accuracy of the medical named entity is effectively improved.
Drawings
FIG. 1 is a flowchart of a dual-map neural network medical named entity recognition method based on multi-feature fusion in an embodiment of the invention;
FIG. 2 is a schematic diagram of an overall architecture in an embodiment of the present invention;
FIG. 3 is a schematic diagram of a word co-occurrence diagram structure according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a dependency algorithm in an embodiment of the present invention.
Detailed Description
The present invention is further illustrated below in conjunction with specific embodiments, it being understood that these embodiments are meant to be illustrative of the invention only and not limiting the scope of the invention, and that modifications of the invention, which are equivalent to those skilled in the art to which the invention pertains, will fall within the scope of the invention as defined in the claims appended hereto.
The invention discloses a double-graph neural network medical entity identification method based on multi-feature fusion, which comprises the following steps:
step 1, preprocessing data of a text sequence; the data preprocessing comprises the steps of performing text word segmentation, capitalization and lowercase, removing non-text content, removing stop words and low-frequency words on a text sequence; and then, carrying out a shuffle operation on the data set, and dividing the data set into a training set and a testing set according to the proportion of 7:3.
And step 2, inputting the text sequence into a two-way long-short-term memory network module, associating the enhanced context Wen Yuyi and acquiring a hidden layer feature vector containing context information.
Step 2.1, using l= (L 1 ,l 2 ,…,l i …,l n ) Representing the text sequence after the text preprocessing of step 1, and using a pre-trained word vector model GloVe for the text sequence l i Representing to obtain Wherein l i ∈R d N represents the length of the sentence and d represents the word vector dimension.
Step 2.2, the two-way long-short-term memory network obtains the text sequence l through the forward and reverse LSTM networks respectively i Context dependency information in front and back directions ensures that each word obtains rich semantic information, and the forward vector and the backward vector obtained through calculation are spliced and serve as output of a hidden layer and can be expressed as:
f t =σ(W f ·[h t-1 ,x t ]+b f )
i t =σ(W i ·[h t-1 ,x t ]+b i )
o t =σ(W o ·[h t-1 ,x t ]+b o )
h t =o t *tanh(C t )
wherein f t 、i t And o t Respectively representing a forgetting gate, an input gate and an output gate at the time t,representing t-moment candidate memory cell vector, C t A memory cell vector h representing time t t Represents hidden layer output vector, represents dot product, W represents weight matrix, b represents bias vector, σ (·) and tanh (·) represent ssigmoid activation function and hyperbolic tangent activation function, respectively.
By outputting the forward LSTM of a bidirectional long and short term memory networkAnd reverse LSTM output->Splicing to obtain final hidden layer characteristics +.>
And 3, respectively constructing a word co-occurrence diagram and a dependency syntax diagram according to the text sequence.
Step 3.1, constructing a word co-occurrence diagram;
step 3.1.1, the text sequence of length n is represented by w= { W 1 ,w 2 ,…,w i ,…,x n Represented by }, w i Representing an ith word in the text sequence, and setting a designated sliding window size;
step 3.1.2 sliding window along text sequence from left to right, window center word is w i If w j And w i Within a window, w is constructed i And w j The edges between word nodes, as shown in FIG. 3;
step 3.1.3, constructing a word co-occurrence graph adjacency matrix A m Can be expressed as:
wherein c ij Representing the number of times two words co-occur within each window.
And 3.2, constructing a dependency syntax graph.
And 3.2.1, analyzing the text sequence by using a dependency syntax analyzer Stanford NLP to obtain a syntax dependency tree of the sentence.
Step 3.2.2, constructing a graph structure G= (V, E) by taking words in the syntactic dependency tree as nodes and taking dependency relationship as directed edges, wherein each edge E (i, j) E represents a head node x i To a slave node x j As shown in fig. 4.
Where V represents the nodes of the graph and E represents the nodes and their edges.
Step 3.2.3 constructing an adjacency matrix from the syntactic dependency graphA s Can be expressed as:
and 4, learning the dependency relationship and interaction between nodes according to the node relationship of the graph structure by the graph convolution neural network to obtain the global feature information of the fusion word co-occurrence node feature information and the dependency syntax feature information.
Step 4.1, aggregating and updating the nodes in the word co-occurrence graph and the dependency syntax graph according to the nodes, meanwhile, the graph convolution operation considers the neighbor node characteristics of the nodes, updates the characteristic representation of the nodes by utilizing the information of the neighbor nodes, and aims at the first-layer word nodeAnd syntax node->Layer 1 word node->And syntax node->The update mode is defined as follows:
wherein W is l Representing model training weight parameters, RELU (·) representing activation functions,representing a normalized symmetric adjacency matrix, which can be expressed as:
where D represents the degree matrix.
Step 4.2, splicing the updated word node vector representation and the dependency syntax node vector representation to obtain global feature information H C Can be expressed as:
and 5, carrying out feature fusion on the context feature information obtained in the step 2 and the global feature information obtained in the step 4, and calculating the dependency relationship inside the feature by adopting a multi-head self-attention mechanism to obtain comprehensive feature information.
Step 5.1, in order to improve the association degree and the dependency relationship between words in sentences, performing feature fusion on the context feature information obtained in the step 2 and the global feature information obtained in the step 4 to obtain a fusion vector, and marking the fusion vector as R= [ H ] t ,H C ]。
Step 5.2, the fusion vector R is linearly transformed to obtain 3 different vector matrices Q (query), K (Key) and V (Calue), and q=k=v=r. And (3) carrying out dot product operation on each keyword matrix K by using Q, scaling an operation result, and normalizing by a softmax function to obtain an attention weight, wherein a calculation formula can be expressed as follows:
wherein,representing the dimensions of the key, softmax (·) represents the probability mapping of the input element.
And 5.3, performing multiple parallel attention calculations on the high-dimensional feature vectors through a multi-head self-attention mechanism, wherein the number of heads is e, each attention head has different feature parameters, after attention information of one head is obtained through calculation, the e heads are spliced and then are subjected to linear transformation to finally obtain comprehensive feature information of fused attention information, and the comprehensive feature information can be expressed as:
M i =SelfAttention(QW Q ,KW K ,VW V )
MulAtt(Q,K,V)=Concat(M 1 ,M 2 ,…,M e )W
wherein W is Q ,W K ,W V The weight matrix learned by the model in training is represented, concat (·) represents the feature vector extracted by the e attention heads is spliced, and W represents the weight matrix generated in the implementation process.
And 6, calculating the comprehensive characteristic information obtained in the step 5 by using a CRF model, and performing sequence labeling to obtain an optimal labeling sequence.
Step 6.1, CRF is a discriminant probability model, and can simultaneously utilize the correlation between input features and labels to predict the global optimal solution of a label sequence, thereby greatly improving the performance of a named entity recognition task. For a given sequence x= { x 1 ,x 2 ,…,x n Sum prediction sequence y= { y 1 ,y 2 ,…,y n -its calculated score can be expressed as:
wherein,representing y i Transfer to y i+1 Probability of->Indicating that the i-th word is marked y i S (x, y) represents the probability that the input sentence sequence x is labeled with the label sequence y.
The foregoing embodiments are merely illustrative of the technical concept and features of the present invention, and are intended to enable those skilled in the art to understand the present invention and to implement the same, not to limit the scope of the present invention. All equivalent changes or modifications made according to the spirit of the present invention should be included in the scope of the present invention.
Claims (7)
1. A dual-image neural network medical entity identification method based on multi-feature fusion is characterized by comprising the following steps:
step 1: preprocessing data of the text sequence, and dividing a training set and a testing set;
step 2: inputting text sequences into a two-way long and short term memory network module, enhancing the association of context Wen Yuyi and obtaining hidden layer feature vector H containing contextual information t ;
Step 3: respectively constructing a word co-occurrence diagram and a dependency syntax diagram according to the text sequence;
step 4: learning the dependency relationship and interaction between nodes according to the node relationship of the graph structure by using the graph convolution neural network to obtain global feature information H of fusion word co-occurrence node feature information and dependency syntax feature information C ;
Step 5: carrying out feature fusion on the context feature information obtained in the step 2 and the global feature information obtained in the step 4, and calculating the dependency relationship inside the feature by adopting a multi-head self-attention mechanism to obtain comprehensive feature information;
step 6: and (5) calculating the comprehensive characteristic information obtained in the step (5) by using a CRF model, and performing sequence labeling to obtain an optimal labeling sequence.
2. The method for identifying the medical named entity of the double-graph neural network based on the multi-feature fusion according to claim 1, wherein the specific method of the step 1 is as follows:
step 1.1: performing text word segmentation, text normalization, text cleaning and removal of stop words and low-frequency words on the text sequence;
and 1.2, performing a shuffle operation on the data set, and dividing the data set into a training set and a testing set according to the ratio of 7:3.
3. The dual-map neural network medical named entity recognition method based on multi-feature fusion according to claim 1, wherein the specific method of the step 2 is as follows:
step 2.1, enhancing the association of the upper and lower Wen Yuyi by utilizing a two-way long-short-term memory network, and acquiring a hidden layer feature vector containing context information;
step 2.1, using l= (L 1 ,l 2 ,...,l i ...,l n ) Representing the text sequence after the text preprocessing of step 1, and using a pre-trained word vector model GloVe for the text sequence l i Representing to obtain Wherein l i ∈R d N represents the length of the sentence and d represents the word vector dimension;
step 2.2, the two-way long-short-term memory network obtains the text sequence l through the forward and reverse LSTM networks respectively i Context dependent information in front and back directions is calculated to obtain a forward vector and a backward vector, and the two vectors are spliced and used as output of a hidden layer and expressed as:
f t =σ(W f ·[h t-1 ,x t ]+b f )
i t =σ(W i ·[h t-1 ,x t ]+b i )
o t =σ(W o ·[h t-1 ,x t ]+b o )
h t =o t *tanh(C t )
wherein f t 、i t And o t Respectively representing a forgetting gate, an input gate and an output gate at the time t,representing t-moment candidate memory cell vector, C t A memory cell vector h representing time t t Representing hidden layer output vector, representing dot product, W representing weight matrix, b representing bias vector, σ (·) and tanh (·) representing sigmoid activation function and hyperbolic tangent activation function, respectively;
by outputting the forward LSTM of a bidirectional long and short term memory networkAnd reverse LSTM output->Splicing to obtain final hidden layer characteristics +.>
4. The method for identifying the medical named entity of the double-graph neural network based on the multi-feature fusion according to claim 1, wherein the specific process of the step 3 is as follows:
step 3.1, constructing a word co-occurrence diagram;
step 3.1.1, the text sequence of length n is represented by w= { W 1 ,w 2 ,...,w i ,...,x n Represented by }, w i Representing an ith word in the text sequence, and setting a designated sliding window size;
step 3.1.2 sliding window along text sequence from left to right, window center word is w i If w j And w i Within a window, w is constructed i And w j Connecting edges between word nodes;
step 3.1.3, constructing a word co-occurrence graph adjacency matrix A m Expressed as:wherein c ij Representing the number of times two words co-occur within each window;
step 3.2, constructing a dependency syntax diagram;
step 3.2.1, analyzing the text sequence by using a dependency syntax analyzer Stanford NLP to obtain a syntax dependency tree of the sentence;
step 3.2.2, constructing a graph structure G= (V, E) by taking words in the syntactic dependency tree as nodes and taking dependency relationship as directed edges, wherein each edge E (i, j) E represents a head node x i To a slave node x j Is dependent on (a); wherein V represents the nodes of the graph, E represents the nodes and the edges of the nodes;
step 3.2.3 constructing an adjacency matrix A from the syntactic dependency graph s Expressed as:
5. the method for identifying the medical named entity of the double-graph neural network based on the multi-feature fusion according to claim 1, wherein the specific process of the step 4 is as follows:
step 4.1, aggregating and updating the nodes in the word co-occurrence graph and the dependency syntax graph according to the nodes, meanwhile, the graph convolution operation considers the neighbor node characteristics of the nodes, updates the characteristic representation of the nodes by utilizing the information of the neighbor nodes, and aims at the first-layer word nodeAnd syntax node->Layer 1 word node->And syntax node->The update mode is defined as follows:
wherein W is l Representing model training weight parameters, RELU (·) representing activation functions,representing a normalized symmetric adjacency matrix expressed as:Wherein D represents a degree matrix;
step 4.2, splicing the updated word node vector representation and the dependency syntax node vector representation to obtain global feature information H C Expressed as:
6. the method for identifying the medical named entity of the double-graph neural network based on the multi-feature fusion according to claim 1, wherein the specific process of the step 5 is as follows:
step 5.1, performing feature fusion on the context feature information obtained in the step 2 and the global feature information obtained in the step 4 to obtain a fusion vector, and marking the fusion vector as R= [ H ] t ,H C ];
Step 5.2, performing linear transformation on the fusion vector R to obtain 3 different vector matrixes Q (query), K (Key) and V (Value), wherein q=k=v=r; and (3) carrying out dot product operation on each keyword matrix K by using Q, scaling an operation result, and normalizing by using a softmax function to obtain attention weight, wherein a calculation formula is expressed as follows:
wherein,representing the dimension of the key, softmax (·) represents probability mapping of the input element;
and 5.3, performing multiple parallel attention calculations on the high-dimensional feature vectors through a multi-head self-attention mechanism, wherein the number of heads is e, each attention head has different feature parameters, after attention information of one head is obtained through calculation, performing linear transformation on the e heads after splicing to finally obtain comprehensive feature information of fused attention information, and the comprehensive feature information is expressed as:
M i =SelfAttention(QW Q ,KW K ,VW V )
MulAtt(Q,K,V)=Concat(M 1 ,M 2 ,...,M e )W
wherein W is Q ,W K ,W V The weight matrix learned by the model in training is represented, concat (·) represents the feature vector extracted by the e attention heads is spliced, and W represents the weight matrix generated in the implementation process.
7. The method for identifying the medical named entity of the double-graph neural network based on the multi-feature fusion according to claim 6, wherein the specific process of the step 6 is as follows:
for a given sequence x= { x 1 ,x 2 ,...,x n The i and predicted sequence y= { y 1 ,y 2 ,...,y n -its calculated score is expressed as:
wherein,representing y i Transfer to y i+1 Probability of->Represent the firsti words are labeled y i S (x, y) represents the probability that the input sentence sequence x is labeled with the label sequence y.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311317519.8A CN117408255A (en) | 2023-10-11 | 2023-10-11 | A dual-graph neural network medical named entity recognition method based on multi-feature fusion |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311317519.8A CN117408255A (en) | 2023-10-11 | 2023-10-11 | A dual-graph neural network medical named entity recognition method based on multi-feature fusion |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN117408255A true CN117408255A (en) | 2024-01-16 |
Family
ID=89488146
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202311317519.8A Pending CN117408255A (en) | 2023-10-11 | 2023-10-11 | A dual-graph neural network medical named entity recognition method based on multi-feature fusion |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN117408255A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119673429A (en) * | 2024-12-05 | 2025-03-21 | 济南凯信医疗科技有限公司 | An Internet-based online gynecological disease intelligent consultation system |
-
2023
- 2023-10-11 CN CN202311317519.8A patent/CN117408255A/en active Pending
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119673429A (en) * | 2024-12-05 | 2025-03-21 | 济南凯信医疗科技有限公司 | An Internet-based online gynecological disease intelligent consultation system |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112818676B (en) | A joint extraction method of medical entity relationships | |
| CN108984724B (en) | Method for improving emotion classification accuracy of specific attributes by using high-dimensional representation | |
| CN110765775B (en) | A Domain Adaptation Method for Named Entity Recognition Fusing Semantics and Label Differences | |
| CN112364174A (en) | Patient medical record similarity evaluation method and system based on knowledge graph | |
| CN111914556B (en) | Emotion guiding method and system based on emotion semantic transfer pattern | |
| CN106980608A (en) | A Chinese electronic medical record word segmentation and named entity recognition method and system | |
| CN111554360A (en) | Drug relocation prediction method based on biomedical literature and domain knowledge data | |
| CN109871538A (en) | A kind of Chinese electronic health record name entity recognition method | |
| CN111950283B (en) | Chinese word segmentation and named entity recognition system for large-scale medical text mining | |
| CN114021584B (en) | Knowledge representation learning method based on graph convolution network and translation model | |
| CN114781382B (en) | Medical named entity recognition system and method based on RWLSTM model fusion | |
| Ke et al. | Medical entity recognition and knowledge map relationship analysis of Chinese EMRs based on improved BiLSTM-CRF | |
| CN108959566B (en) | A kind of medical text based on Stacking integrated study goes privacy methods and system | |
| CN111078875A (en) | Method for extracting question-answer pairs from semi-structured document based on machine learning | |
| CN111274790A (en) | Text-level event embedding method and device based on syntactic dependency graph | |
| CN111222318A (en) | Trigger word recognition method based on two-channel bidirectional LSTM-CRF network | |
| CN114077673A (en) | Knowledge graph construction method based on BTBC model | |
| Ren et al. | Detecting the scope of negation and speculation in biomedical texts by using recursive neural network | |
| Zhang et al. | Using a pre-trained language model for medical named entity extraction in Chinese clinic text | |
| CN116108840A (en) | A text fine-grained sentiment analysis method, system, medium and computing device | |
| CN117891958B (en) | A standard data processing method based on knowledge graph | |
| CN111125378A (en) | Closed-loop entity extraction method based on automatic sample labeling | |
| CN116680407B (en) | A method and apparatus for constructing a knowledge graph | |
| CN114373512B (en) | Protein interaction relationship extraction method based on Gaussian enhancement and auxiliary tasks | |
| CN117408255A (en) | A dual-graph neural network medical named entity recognition method based on multi-feature fusion |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |