CN114429132A

CN114429132A - Named entity identification method and device based on mixed lattice self-attention network

Info

Publication number: CN114429132A
Application number: CN202210172667.4A
Authority: CN
Inventors: 王立松; 何宗锋; 刘绍翰; 刘亮
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2022-02-24
Filing date: 2022-02-24
Publication date: 2022-05-03
Anticipated expiration: 2042-02-24
Also published as: CN114429132B

Abstract

The invention discloses a named entity identification method based on a mixed lattice self-attention network, which comprises the following steps: s1, coding the sentence characteristic vector expressed by the word pair into a matrix with fixed dimensionality to obtain the word vector expression with a mixed lattice structure; constructing a self-attention network to capture the influence of word vectors in the vector on the word vectors and enhance the feature representation of each word vector; word features are fused in an Embedding layer of BERT, and better word vector representation is obtained through fine tuning of a learning process; and realizing an entity sequence labeling task and a decoding process in entity identification according to a BilSTM-CRF network, completing modeling of the character characteristics after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network. The invention can capture global vocabulary information, generate word vector representation with rich semantics and improve the recognition precision of the Chinese named entity on a plurality of data sets.

Description

A Named Entity Recognition Method and Device Based on Hybrid Lattice Self-Attention Network

技术领域technical field

本发明涉及人工智能中自然语言处理技术领域，具体而言涉及一种基于混合格自注意力网络的命名实体识别方法和装置。The invention relates to the technical field of natural language processing in artificial intelligence, in particular to a named entity recognition method and device based on a hybrid lattice self-attention network.

背景技术Background technique

命名实体识别(NER)也叫实体抽取，最早是在MUC-6会议上提出，它是信息抽取技术中从文本抽取实体的技术。早期的实体识别采用基于规则和基于统计等方法，由于这些传统方法过于依赖人工的设计，且识别的覆盖率小、识别精度低，早已经被深度学习方法取代。在基于深度学习的方法中，实体识别模型分为基于字的模型(character-based)和基于词的模型(word-based)，英文等其他一些语言通常采用基于字的模型，因为每个单词都有明确的含义；汉语中字的含义是模糊的，而词的含义是具体的，所以中文NER方法中采用基于词的模型。为了更好的表示中文中每个字向量，后来有学者提出了基于表示学习的方法，它是一种将人类语言信息转换为机器能识别的特征的学习方式，能够提升机器学习中语义表达的准确性。Named Entity Recognition (NER), also called entity extraction, was first proposed at the MUC-6 conference. It is a technology for extracting entities from text in information extraction technology. Early entity recognition used rule-based and statistics-based methods. Since these traditional methods rely too much on manual design, the recognition coverage is small, and the recognition accuracy is low, they have been replaced by deep learning methods. In the method based on deep learning, entity recognition models are divided into character-based and word-based models. Some other languages such as English usually use word-based models, because each word is There is a clear meaning; the meaning of words in Chinese is vague, while the meaning of words is specific, so word-based models are used in Chinese NER methods. In order to better represent each word vector in Chinese, some scholars later proposed a method based on representation learning, which is a learning method that converts human language information into features that can be recognized by machines, which can improve the semantic expression in machine learning. accuracy.

在命名实体识别方法中，外部的词汇信息能够有效的提升识别的精度，但是这些方法依赖于融合算法的性能。例如，专利号为CN113836930A的发明中提出一种中文危险化学品命名实体识别方法，在BiLSTM-CRF模型的基础上，利用预训练语言模型BERT获取危险化学品领域的文本字符级别编码，得到基于上下文信息的字向量，然后引入注意力机制，增强模型挖掘文本的全局和局部特征的能力。专利号为CN113128232A的发明中提出一种基于ALBERT与多重词信息嵌入的命名实体识别方法，可以有效的表征字的多义性，提升实体识别的效率。专利号为CN111310470A的发明公开了一种融合字词特征的中文命名实体识别方法，通过综合分析后得到的结果数据加强了模型对文本的理解，提高了模型识别任务中的F1值。In named entity recognition methods, external lexical information can effectively improve the recognition accuracy, but these methods rely on the performance of fusion algorithms. For example, the invention with the patent number CN113836930A proposes a Chinese named entity recognition method for hazardous chemicals. On the basis of the BiLSTM-CRF model, the pre-trained language model BERT is used to obtain the text character-level encoding in the field of hazardous chemicals. Information word vectors, and then introduce an attention mechanism to enhance the model’s ability to mine the global and local features of the text. The invention with the patent number of CN113128232A proposes a named entity recognition method based on ALBERT and multi-word information embedding, which can effectively characterize the ambiguity of words and improve the efficiency of entity recognition. The invention with the patent number CN111310470A discloses a Chinese named entity recognition method integrating word features. The result data obtained after comprehensive analysis strengthens the model's understanding of the text and improves the F1 value in the model recognition task.

虽然现有的方法在融合词特征向量上已经取得了不错的效果，但现有的技术手段中存在的问题有：1)字词特征融合方法没有考虑到不同模型训练的字词向量在语义表达上存在差异性，直接空洞地将二者进行融合，不能有效的增强字向量的词级特征；2)在基于学习词权重的词汇增强方法中，只考虑了每个字特征的匹配词对字语义表示的影响，忽略了全局词汇信息的作用。Although the existing methods have achieved good results in the fusion of word feature vectors, the existing technical means have the following problems: 1) The word feature fusion method does not take into account the semantic expression of word vectors trained by different models. There are differences in terms of learning, and the direct and empty fusion of the two cannot effectively enhance the word-level features of word vectors; 2) In the vocabulary enhancement method based on learned word weights, only the matching word pairs of each word feature are considered. The impact of semantic representation ignores the role of global lexical information.

发明内容SUMMARY OF THE INVENTION

本发明针对现有技术中的不足，提供一种基于混合格自注意力网络的命名实体识别方法和装置，基于表示学习的思想，所提出的模型能够融合词汇信息，以此来增强字向量的特征表示，使得生成的字向量中包含了更多的实体边界信息，从而能够提升NER任务的准确性。Aiming at the deficiencies in the prior art, the present invention provides a named entity recognition method and device based on a hybrid lattice self-attention network. Based on the idea of representation learning, the proposed model can fuse lexical information, so as to enhance the performance of word vectors. The feature representation makes the generated word vector contain more entity boundary information, which can improve the accuracy of the NER task.

为实现上述目的，本发明采用以下技术方案：To achieve the above object, the present invention adopts the following technical solutions:

一种基于混合格自注意力网络的命名实体识别方法，所述命名实体识别方法包括以下步骤：A named entity recognition method based on a hybrid lattice self-attention network, the named entity recognition method comprises the following steps:

S1，在词典中查找输入句子中由连续个字组成的词，通过位置交替映射合并成一个单独的多维向量，采用混合字词格编码的方式将字词对表示的句子特征向量编码为一个维度固定的矩阵，得到相应的混合格结构的字词向量表示；S1: Find words composed of consecutive words in the input sentence in the dictionary, merge them into a single multi-dimensional vector through positional alternate mapping, and encode the sentence feature vector represented by the word pair into a single dimension by means of mixed word lattice encoding. A fixed matrix to obtain the word vector representation of the corresponding mixed lattice structure;

S2，基于步骤S1中生成的混合格结构的字词向量，构造相应的自注意力网络以捕获该向量中词向量对字向量的影响，以此来增强每个字向量的特征表示；S2, based on the word vector of the mixed lattice structure generated in step S1, construct a corresponding self-attention network to capture the influence of the word vector in the vector on the word vector, so as to enhance the feature representation of each word vector;

S3，在BERT的Embedding层融合词特征，通过微调学习过程，学习得到更好的字向量表示；依据BiLSTM-CRF网络实现实体识别中的实体序列标注任务和解码过程，通过该网络完成对融合后字特征的建模，构建完成基于混合格自注意力网络的实体识别模型；S3, fuse word features in the Embedding layer of BERT, and learn to obtain better word vector representation by fine-tuning the learning process; realize the entity sequence labeling task and decoding process in entity recognition based on the BiLSTM-CRF network, and complete the fusion process through the network. Modeling of word features, constructing an entity recognition model based on hybrid lattice self-attention network;

S4，在数据集上对基于混合格自注意力网络的实体识别模型进行训练。S4, train the entity recognition model based on the hybrid lattice self-attention network on the dataset.

为优化上述技术方案，采取的具体措施还包括：In order to optimize the above technical solutions, the specific measures taken also include:

进一步地，步骤S1中，采用混合字词格编码的方式将字词对表示的句子特征向量编码为一个维度固定的矩阵，得到相应的混合格结构的字词向量表示的过程包括以下步骤：Further, in step S1, the sentence feature vector represented by the word pair is encoded into a matrix with a fixed dimension by means of mixed word lattice encoding, and the process of obtaining the word vector representation of the corresponding mixed lattice structure includes the following steps:

S11，给定一个句子s_c＝{c₁,c₂,…,c_n}，通过加载预训练的BERT权重，得到句子s_c的字特征向量表示

其中

c_i表示s_c中的第i个字，n表示s的字数长度，e^B表示BERT预训练字向量的查找表；S11, given a sentence s _c ={c ₁ ,c ₂ ,...,c _n }, by loading the pre-trained BERT weights, the word feature vector representation of the sentence s _c is obtained

in

c _i represents the i-th word in s _c , n represents the word length of s, and e ^B represents the look-up table of the BERT pre-training word vector;

S12，给定一个中文词典L，构造Trie字典树，遍历该树的节点，得到每个字所匹配到的词汇；S12, given a Chinese dictionary L, construct a Trie dictionary tree, traverse the nodes of the tree, and obtain the vocabulary matched by each word;

S13，将所有匹配到的词汇按照BMES标记分组，即对于字符c_i，词集B(c_i)由以它开头的匹配词组成，集合M(c_i)由c_i为其内部字的匹配词组成，集合E(c_i)由以c_i结尾的匹配词组成，集合S(c_i)由c_i的单字符词组成；句子s_c中每个字c_i的词集w_i表示为：S13, group all the matched words according to BMES tags, that is, for the character c _i , the word set B(ci ) consists of the matching words starting with it, and the set M ₍ ci _{) consists of the matching words of c i} _as its internal words The set E(ci ) consists of matching words ending in c _i , and the set S(ci ₎ consists of single-character words of c _i ; the word set _wi of each word c _i in the sentence _s _c is expressed as :

w_i＝{e^w(B(c_i)),e^w(M(c_i)),e^w(E(c_i)),e^w(S(c_i))}；w _i = {e ^w (B( _ci )), e ^w (M( _ci )), e ^w (E( _ci )), e ^w (S( _ci ))};

其中e^w表示预训练的词向量查找表；where e ^w represents the pre-trained word vector lookup table;

S14，设置两层可学习的非线性全连接层将w_i的维度升至和字向量

一致，BERT在微调的时候，对这两层权重进行学习，使预训练的词特征向量映射到BERT的语义特征空间；处理后的词特征向量表示如下：S14, set up a two-layer learnable nonlinear fully connected layer to increase the dimension of w _i to the sum word vector

Consistently, when BERT is fine-tuning, it learns the weights of these two layers, so that the pre-trained word feature vector is mapped to the semantic feature space of BERT; the processed word feature vector is represented as follows:

其中W₁∈(d_c×d_c),W₂∈(d_c×d_w)是可学习的权重矩阵，b₁和b₂是对应的偏置，d_c表示BERT字向量的维度，d_w表示预训练词向量的维度；where W ₁ ∈(d _c ×d _c ), W ₂ ∈(d _c ×d _w ) are the learnable weight matrices, b ₁ and b ₂ are the corresponding biases, d _c represents the dimension of the BERT word vector, and d _w represents the dimension of the pre-trained word vector;

S15，将转换后的词特征向量

作为特征融合模型的输入，按照字和词集的对应关系，将每个字-词对特征表示为：S15, convert the transformed word feature vector

As the input of the feature fusion model, according to the corresponding relationship between the word and the word set, the feature of each word-word pair is expressed as:

S16，将字-词对的特征表示如下：S16, the features of word-word pairs are represented as follows:

其中

表示向量拼接符。in

Represents a vector splice.

进一步地，步骤S2中，基于步骤S1中生成的混合格结构的字词向量，构造相应的自注意力网络以捕获该向量中词向量对字向量的影响，以此来增强每个字向量的特征表示的过程包括以下步骤：Further, in step S2, based on the word vector of the mixed lattice structure generated in step S1, a corresponding self-attention network is constructed to capture the influence of the word vector in the vector on the word vector, so as to enhance the effect of each word vector. The process of feature representation includes the following steps:

S21，设计Mixed-lattice自注意力网络来捕获字词特征间的关联，自注意力网络将混合字词编码向量V^ME和词位置屏蔽矩阵M作为增强网络的输入，通过该自注意力网络对全局的词向量和字向量的建模，使模型学习到词和字间的词义相关性权重，Q、K、V矩阵的计算如下：S21, Design a Mixed-lattice self-attention network to capture the association between word features. The self-attention network uses the mixed word encoding vector V ^ME and the word position masking matrix M as the input of the enhancement network. The global word vector and word vector modeling enable the model to learn the weight of the word sense correlation between words and words. The Q, K, and V matrices are calculated as follows:

[Q,K,V]＝[W_qV^ME,W_kV^ME,W_vV^ME]；[Q, K, V]=[W _q V ^ME , W _k V ^ME , W _v V ^ME ];

其中

是可学习的权重矩阵，且d_e＝d_c+d_w；Q、K、V矩阵分别为查询项矩阵、查询项对应的键项矩阵和待加权平均的值项矩阵；d_e表示mixed-lattice向量的维度、d_c表示字向量的维度、d_w词向量的维度；in

is a learnable weight matrix, and d _e =d _c +d _w ; Q, K, V matrices are respectively the query item matrix, the key item matrix corresponding to the query item, and the value item matrix to be weighted average; d _e represents mixed- The dimension of lattice vector, d _c represents the dimension of word vector, and the dimension of d _w word vector;

S22，将点积运算作为相似性分数的计算公式：S22, use the dot product operation as the calculation formula of the similarity score:

F_Att＝Softmax(S_Att+εM)V；F _Att =Softmax(S _Att +εM)V;

其中M是静态的词位置屏蔽矩阵，ε是一个值为无穷小的矩阵，

是自注意力网络的输出；其中S_Att表示归一化后的注意力得分、K^T表示矩阵K的转置；where M is a static word position masking matrix, ε is an infinitesimal value matrix,

is the output of the self-attention network; where S _Att represents the normalized attention score, and K ^T represents the transpose of the matrix K;

S23，将词特征信息作为残差加入到BERT预训练字向量中，得到的词汇增强字特征向量为：S23, the word feature information is added to the BERT pre-training word vector as a residual, and the obtained word feature vector for lexical enhancement is:

C′＝C+g(F_Att)；C'=C+g(F _Att );

其中

表示BERT的预训练字向量特征，函数g(*)用于移除self-attention网络中的词向量通道来保证C和F_Att向量维度的一致性，得到词汇增强后的字嵌入向量C′。in

Represents the pre-training word vector feature of BERT. The function g(*) is used to remove the word vector channel in the self-attention network to ensure the consistency of the C and F _Att vector dimensions, and obtain the word embedding vector C' after vocabulary enhancement.

进一步地，步骤S3中，构建完成基于混合格自注意力网络的实体识别模型的过程包括以下步骤：Further, in step S3, the process of constructing and completing the entity recognition model based on the hybrid lattice self-attention network includes the following steps:

S31，给定一个长度为n的句子序列s_c＝{c₁,c₂,…,c_n}，经过词汇增强后的字向量表示为C′＝{c′₁,c′₂,…,c′_n}，在BERT模型中微调字向量C′，词汇增强后的BERT字嵌入向量表示为：S31, given a sentence sequence s _c ={c ₁ ,c ₂ ,...,c _n } of length n, the word vector after lexical enhancement is expressed as C'={c' ₁ ,c' ₂ ,..., c' _n }, the word vector C' is fine-tuned in the BERT model, and the BERT word embedding vector after vocabulary enhancement is expressed as:

E′_i＝C′_i+E_s(i)+E_p(i)；E′ _i =C′ _i +E _s (i)+E _p (i);

其中E_s和E_p分别表示分隔向量和位置向量查找表；i表示长度为n的字符序列s_c中的第i个字符；where E _s and E _p represent the separation vector and position vector lookup table respectively; i represents the ith character in the character sequence s _c of length n;

S32，将得到的E′输入到BERT中，每个transformer块的计算公式如下：S32, input the obtained E' into BERT, and the calculation formula of each transformer block is as follows:

D＝LN(H_k-1+MHA(H_k-1)；D=LN(H _k-1 +MHA(H _k-1 );

H_k＝LN(FFN(D)+D)；H _k =LN(FFN(D)+D);

其中H_k表示第k层的隐状态输出，H₀＝E′表示底层的字向量；LN是层归一化函数；MHA是多头自注意力模块；FFN表示两层的前馈神经网络；D表示多头注意力模块归一化后的输出向量；where H _k represents the hidden state output of the kth layer, H ₀ =E′ represents the word vector of the bottom layer; LN is the layer normalization function; MHA is the multi-head self-attention module; FFN represents the two-layer feedforward neural network; D Represents the normalized output vector of the multi-head attention module;

S33，获得最后一层transformer的隐状态输出向量

将

输入到一个双向的LSTM网络中，分别从句子的左到右和右到左捕捉语义信息；前向的LSTM网络的隐状态输出表示为

后向的LSTM网络的输出为

BI-LSTM网络的输出是sequence-labeling层的输出，表示为：S33, obtain the hidden state output vector of the last layer of transformer

Will

The input is fed into a bidirectional LSTM network that captures semantic information from left-to-right and right-to-left sentences, respectively; the hidden state output of the forward LSTM network is expressed as

The output of the backward LSTM network is

The output of the BI-LSTM network is the output of the sequence-labeling layer, which is expressed as:

其中h_i表示第i个Bi-LSTMs神经元的级联隐状态输出，用来表示c_i的字符级上下文语义表示；where hi represents the cascaded hidden state output of the _i -th Bi- _LSTMs neuron, which is used to represent the character-level contextual semantic representation of ci;

S34，使用标准的CRF层来预测NER标签，给定网络最后一层的隐状态输出向量H＝{h₁,h₂,…,h_n}，假如y＝{y₁,y₂,…,y_n}表示标签序列，对于一个句子s＝{s₁,s₂,…,s_n}，其对应的标签序列的概率定义如下：S34, use the standard CRF layer to predict the NER label, given the hidden state output vector H={h ₁ , h ₂ ,..., h _n } of the last layer of the network, if y={y ₁ , y ₂ ,..., y _n } represents the label sequence. For a sentence s={s ₁ ,s ₂ ,...,s _n }, the probability of the corresponding label sequence is defined as follows:

其中y′表示所有标签序列中任意一个标签序列；

表示对应于y_i的网络中可学习的权重参数；

表示对应于y_i-1和y_i之间的偏置量；同样地，

分别表示在任意可能的标签y′下的模型权重参数和偏置量；where y' represents any tag sequence in all tag sequences;

represents the learnable weight parameters in the network corresponding to _yi ;

represents the amount corresponding to the offset between y _i-1 and y _i ; similarly,

represent the model weight parameters and biases under any possible label y', respectively;

S35，将负对数似然损失作为模型的损失函数，表示为：S35, taking the negative log-likelihood loss as the loss function of the model, expressed as:

基于前述命名实体识别方法，本发明提出一种基于混合格自注意力网络的命名实体识别装置，所述命名实体识别装置包括混合格结构编码模块、词汇增强模块、序列标注和解码模块，以及模型训练模块；Based on the aforementioned named entity recognition method, the present invention proposes a named entity recognition device based on a hybrid lattice self-attention network. The named entity recognition device includes a hybrid lattice structure encoding module, a vocabulary enhancement module, a sequence labeling and decoding module, and a model training module;

所述混合格结构编码模块用于在词典中查找输入句子中由连续个字组成的词，通过位置交替映射合并成一个单独的多维向量，采用混合字词格编码的方式将字词对表示的句子特征向量编码为一个维度固定的矩阵，得到相应的混合格结构的字词向量表示；The mixed lattice structure coding module is used to find words composed of consecutive words in the input sentence in the dictionary, merge them into a single multi-dimensional vector through alternate position mapping, and use the mixed word lattice encoding to represent the word pairs. The sentence feature vector is encoded as a matrix with a fixed dimension, and the corresponding word vector representation of the mixed lattice structure is obtained;

所述词汇增强模块用于基于生成的混合格结构的字词向量，构造相应的自注意力网络以捕获该向量中词向量对字向量的影响，以此来增强每个字向量的特征表示；The vocabulary enhancement module is used to construct a corresponding self-attention network based on the generated word vector of the mixed lattice structure to capture the influence of the word vector in the vector on the word vector, so as to enhance the feature representation of each word vector;

所述序列标注和解码模块用于在BERT的Embedding层融合词特征，通过微调学习过程，学习得到更好的字向量表示；依据BiLSTM-CRF网络实现实体识别中的实体序列标注任务和解码过程，通过该网络完成对融合后字特征的建模，构建完成基于混合格自注意力网络的实体识别模型；The sequence labeling and decoding module is used to fuse word features in the Embedding layer of BERT, and learn to obtain better word vector representation by fine-tuning the learning process; realize the entity sequence labeling task and decoding process in entity recognition according to the BiLSTM-CRF network, Through this network, the model of the fused word features is completed, and the entity recognition model based on the hybrid lattice self-attention network is constructed;

所述模型训练模块用于在数据集上对基于混合格自注意力网络的实体识别模型进行训练。The model training module is used to train the entity recognition model based on the hybrid lattice self-attention network on the dataset.

本发明的有益效果是：The beneficial effects of the present invention are:

本发明的基于混合格自注意力网络的命名实体识别方法和装置，通过构建融合网络，捕获了全局的词汇信息，生成了语义丰富的字向量表示，在多个数据集上提升了中文命名实体识别的精度。相比于未使用词汇增强网络的BERT基准模型，该发明在四个数据集上分别有4.55％、0.54％、1.82％和0.91％的性能提升，这说明通过词汇增强技术提升字向量的特征表示是一种有效提升中NER性能的方法。同时对比于其他词汇增强方法，本发明所提出的特征融合框架(MELSN)能够有效的融合更丰富的词汇语义特征，借助预训练模型BERT的微调机制，词汇增强后的字特征表示包含更多的词汇语义。对比实验结果表明，本发明可以更好地利用字-词格结构实现词汇增强，也说明本发明提出的方法在中文命名实体识别任务上的高效性。The named entity recognition method and device based on the hybrid lattice self-attention network of the present invention captures global vocabulary information by constructing a fusion network, generates a word vector representation with rich semantics, and improves Chinese named entities on multiple data sets. recognition accuracy. Compared with the BERT benchmark model without the vocabulary enhancement network, the invention has 4.55%, 0.54%, 1.82% and 0.91% performance improvements on the four datasets respectively, which shows that the feature representation of word vectors is improved by the vocabulary enhancement technology. It is an effective method to improve the performance of medium NER. At the same time, compared with other vocabulary enhancement methods, the feature fusion framework (MELSN) proposed in the present invention can effectively integrate richer vocabulary semantic features. With the fine-tuning mechanism of the pre-training model BERT, the word feature representation after vocabulary enhancement contains more Lexical semantics. The comparative experimental results show that the present invention can better utilize the word-word case structure to achieve vocabulary enhancement, which also shows the high efficiency of the method proposed by the present invention in the task of Chinese named entity recognition.

附图说明Description of drawings

图1是本发明的基于混合格自注意力网络的命名实体识别装置的结构示意图。FIG. 1 is a schematic structural diagram of a named entity recognition device based on a hybrid lattice self-attention network of the present invention.

具体实施方式Detailed ways

现在结合附图对本发明作进一步详细的说明。The present invention will now be described in further detail with reference to the accompanying drawings.

需要注意的是，发明中所引用的如“上”、“下”、“左”、“右”、“前”、“后”等的用语，亦仅为便于叙述的明了，而非用以限定本发明可实施的范围，其相对关系的改变或调整，在无实质变更技术内容下，当亦视为本发明可实施的范畴。It should be noted that the terms such as "up", "down", "left", "right", "front", "rear", etc. quoted in the invention are only for the convenience of description and clarity, and are not used for Limiting the applicable scope of the present invention, the change or adjustment of the relative relationship shall be regarded as the applicable scope of the present invention without substantially changing the technical content.

本发明提及一种基于混合格自注意力网络的命名实体识别方法，该命名实体识别方法包括以下步骤：The present invention refers to a named entity recognition method based on a hybrid lattice self-attention network, and the named entity recognition method comprises the following steps:

S1，在词典中查找输入句子中由连续个字组成的词，通过位置交替映射合并成一个单独的多维向量，采用混合字词格编码的方式将字词对表示的句子特征向量编码为一个维度固定的矩阵，得到相应的混合格结构的字词向量表示。S1: Find words composed of consecutive words in the input sentence in the dictionary, merge them into a single multi-dimensional vector through positional alternate mapping, and encode the sentence feature vector represented by the word pair into a single dimension by means of mixed word lattice encoding. A fixed matrix to obtain the word vector representation of the corresponding mixed lattice structure.

S2，基于步骤S1中生成的混合格结构的字词向量，构造相应的自注意力网络以捕获该向量中词向量对字向量的影响，以此来增强每个字向量的特征表示。S2, based on the word vector of the mixed lattice structure generated in step S1, construct a corresponding self-attention network to capture the influence of the word vector in the vector on the word vector, so as to enhance the feature representation of each word vector.

S3，在BERT的Embedding层融合词特征，通过微调学习过程，学习得到更好的字向量表示；依据BiLSTM-CRF网络实现实体识别中的实体序列标注任务和解码过程，通过该网络完成对融合后字特征的建模，构建完成基于混合格自注意力网络的实体识别模型。S3, fuse word features in the Embedding layer of BERT, and learn to obtain better word vector representation by fine-tuning the learning process; realize the entity sequence labeling task and decoding process in entity recognition based on the BiLSTM-CRF network, and complete the fusion process through the network. Modeling of word features, and constructing an entity recognition model based on hybrid lattice self-attention network.

基于前述方法，本实施例还提及一种基于混合格自注意力网络的命名实体识别装置。图1是本发明的基于混合格自注意力网络的命名实体识别装置的结构示意图。整个架构分为三个部分：混合格结构编码模块(Mixed-lattice Encoding Module)、词汇增强模块(Lexicon Enhancement Module)、序列标注和解码模块(Sequence-labeling andDecoding)。在第一个模块完成字-词对向量的编码，即搜索词典树中所有的词，并加载预训练的字、词向量，然后将字-词对向量编码成混合格嵌入向量(Mixed-lattice embedding)，和该阶段生成的词遮盖向量(Words mask)一同传送到下一个模块；第二个模块中通过提出的自注意力模型实现词汇增强过程，增强后的字向量表示被传入BERT模型中进行微调；最后一个模块根据增强和微调后的字向量进行建模，完成每个字向量的标签预测和解码过程。Based on the aforementioned method, the present embodiment also mentions a named entity recognition device based on a hybrid lattice self-attention network. FIG. 1 is a schematic structural diagram of a named entity recognition device based on a hybrid lattice self-attention network of the present invention. The entire architecture is divided into three parts: Mixed-lattice Encoding Module, Lexicon Enhancement Module, and Sequence-labeling and Decoding. In the first module, the encoding of the word-word pair vector is completed, that is, all words in the dictionary tree are searched, and the pre-trained words and word vectors are loaded, and then the word-word pair vector is encoded into a mixed lattice embedding vector (Mixed-lattice embedding), and the Words mask generated at this stage is sent to the next module; the second module implements the vocabulary enhancement process through the proposed self-attention model, and the enhanced word vector representation is passed to the BERT model Fine-tuning is performed in the last module; the last module is modeled according to the word vector after the enhancement and fine-tuning, and the label prediction and decoding process of each word vector is completed.

本发明提出了一种基于词汇增强的命名实体识别方法，模型是基于BERT-BiLSTM-CRF网络改进的，该方法在BERT的Embedding层增加了基于Attention网络的词汇增强模块；我们对于字词对向量进行编码，设计了一种交差排列的字词向量编码方式，通过Attention网络完成词汇增强后，融合特征进行归一化之后以残差的方式加入原来的BERT字向量。本发明的具体实施过程如下：The invention proposes a named entity recognition method based on vocabulary enhancement. The model is improved based on the BERT-BiLSTM-CRF network. The method adds a vocabulary enhancement module based on Attention network to the Embedding layer of BERT; After encoding, a cross-arranged word vector encoding method is designed. After the vocabulary enhancement is completed through the Attention network, the fusion features are normalized and then added to the original BERT word vector in the form of residual. The specific implementation process of the present invention is as follows:

步骤1：混合格结构的字词向量构造Step 1: Word vector construction with mixed lattice structure

混合字词格(Mixed-Lattice)编码的作用是将字词对表示的句子特征向量编码为一个维度固定的矩阵。首先在词典中查找输入句子中由连续个字组成的词，然后再通过位置交替映射合并成一个单独的多维向量。The function of the mixed word lattice (Mixed-Lattice) encoding is to encode the sentence feature vector represented by the word pair into a matrix with a fixed dimension. Words consisting of consecutive words in the input sentence are first looked up in the dictionary, and then merged into a single multidimensional vector through a positional alternation map.

具体的，给定一个句子s_c＝{c₁,c₂,…,c_n}，通过加载预训练的BERT权重，可以直接得到句子s_c的字特征向量表示

其中

c_i表示s_c中的第i个字，n表示s的字数长度，e^B表示BERT预训练字向量的查找表。给定一个中文词典L，首先构造Trie字典树，遍历该树的节点，可以得到每个字所匹配到的词汇。将所有匹配到的词汇按照“BMES”标记分组，即对于字符c_i，词集B(c_i)由以它开头的匹配词组成。类似地，集合M(c_i)由c_i为其内部字的匹配词组成，集合E(c_i)由以c_i结尾的匹配词组成，集合S(c_i)由c_i的单字符词组成，所以句子s_c中每个字c_i的词集w_i可以表示为：Specifically, given a sentence s _c ={c ₁ ,c ₂ ,...,c _n }, by loading the pre-trained BERT weights, the word feature vector representation of the sentence s _c can be directly obtained

in

c _i represents the ith word in s _c , n represents the word length of s, and e ^B represents the lookup table of BERT pre-training word vectors. Given a Chinese dictionary L, first construct a Trie dictionary tree, traverse the nodes of the tree, and get the vocabulary matched by each word. All matched words are grouped by "BMES" token, ie for character c _i , word set B(ci ₎ consists of matched words starting with it. Similarly, set M( _ci ) consists of matching words with _ci as its internal word, set E( _ci ) consists of matching words ending in _ci , and set S( _ci ) _consists of single-character words of ci composition, so the word set wi for each word _c _i in sentence _sc can be expressed as:

其中e^w表示预训练的词向量查找表。为了保证向量维度的一致，设置了两层可学习的非线性全连接层将w_i的维度升至和字向量

一致，BERT在微调的时候，对这两层权重进行学习，能够使预训练的词特征向量映射到BERT的语义特征空间。处理后的词特征向量表示如下：where e ^w represents the pre-trained word vector lookup table. In order to ensure the consistency of the vector dimensions, two learnable non-linear fully connected layers are set up to increase the dimension of _wi to the sum word vector

Consistently, when BERT is fine-tuning, it learns the weights of these two layers, so that the pre-trained word feature vector can be mapped to the semantic feature space of BERT. The processed word feature vector is represented as follows:

其中W₁∈(d_c×d_c),W₂∈(d_c×d_w)是可学习的权重矩阵,b₁和b₂是对应的偏置。d_c表示BERT字向量的维度、d_w表示预训练词向量的维度。在我们的特征融合方法中，转换后的词特征向量

作为特征融合模型的输入。按照字和词集的对应关系，可以将每个字-词对特征表示为：where W ₁ ∈(d _c ×d _c ), W ₂ ∈(d _c ×d _w ) are learnable weight matrices, and b ₁ and b ₂ are the corresponding biases. d _c represents the dimension of the BERT word vector, and d _w represents the dimension of the pre-trained word vector. In our feature fusion method, the transformed word feature vector

as the input to the feature fusion model. According to the correspondence between words and word sets, each word-word pair feature can be expressed as:

目前很多基于词汇融合的NER方法直接将字-词对特征I_i作为词汇增强网络的输入，这种方法只能融合局部的词汇信息，在实验中我们发现在句子s_c中，其他字的匹配词也包含着丰富的词汇语义，为了使模型能够捕获全局的词级特征，本实施例提出了一种新的字-词对编码方式，即字-词混合格编码(Mixed-Lattice Embedding)，字-词对的特征表示如下：At present, many NER methods based on vocabulary fusion directly use the word-word pair feature I _i as the input of the vocabulary enhancement network. This method can only integrate local vocabulary information. In the experiment, we found that in the sentence _sc , the matching of other words Words also contain rich lexical semantics. In order to enable the model to capture global word-level features, this embodiment proposes a new word-word pair encoding method, namely Mixed-Lattice Embedding. The features of word-word pairs are represented as follows:

其中

表示向量拼接符，基于该编码方法，字词向量交叉编码成一个固定维度的特征向量，在该方法中，词向量排列越近，与字的关联性越高，这也与实际相符合。在下一阶段，本实施例构造了基于Attention机制的融合网络，来对字-词对特征表示V^ME进行建模。in

Represents a vector splicer. Based on this encoding method, word vectors are cross-encoded into a fixed-dimensional feature vector. In this method, the closer the word vectors are arranged, the higher the correlation with the word, which is also in line with reality. In the next stage, this embodiment constructs a fusion network based on the Attention mechanism to model the word-word pair feature representation ^VME .

步骤2：词特征融合网络的计算过程Step 2: Calculation process of word feature fusion network

词汇增强网络的作用是建模字-词特征表示向量V^ME，用词特征来增强序列中字向量的特征表示。基于Vaswani等人提出的注意力机制的设计思想，设计了Mixed-latticeself-attention的网络来捕获字词特征间的关联。在前一阶段，我们已经得到了编码字词对的混合格向量表示V^ME，这种Lattice的结构将字和词向量紧密的组织成一个多维度向量。字和词特征交叉分布在该embedding中，self-attention网络将V^ME和词位置屏蔽矩阵M作为增强网络的输入，通过该自注意力网络的建模，模型能够学习到词和字间的词义相关性权重。给定一句话的混合字词编码向量V^ME和M矩阵，Q、K、V矩阵的计算如下：The role of the vocabulary enhancement network is to model the word-word feature representation vector V ^ME , and the word features are used to enhance the feature representation of the word vectors in the sequence. Based on the design idea of attention mechanism proposed by Vaswani et al., a Mixed-latticeself-attention network is designed to capture the association between word features. In the previous stage, we have obtained the mixed lattice vector representation V ^ME , which encodes word pairs. This Lattice structure tightly organizes words and word vectors into a multi-dimensional vector. Words and word features are cross-distributed in the embedding. The self-attention network uses V ^ME and word position masking matrix M as the input of the enhanced network. Through the modeling of the self-attention network, the model can learn the meaning of words and words. Relevance weights. Given the mixed word encoding vector V ^ME and M matrix of a sentence, the Q, K, and V matrices are calculated as follows:

其中

是可学习的权重矩阵，且d_e＝d_c+d_w，然后将点积运算作为相似性分数的计算公式：in

is a learnable weight matrix, and d _e =d _c +d _w , and then the dot product operation is used as the calculation formula of the similarity score:

F_Att＝Softmax(S_Att+εM)V；F _Att =Softmax(S _Att +εM)V;

是self-attention网络的输出。将M点乘一个无穷小的矩阵来屏蔽所以词位置上得到的注意力得分。经过Softmax函数的计算，词位置的概率全为零，而字符所在的位置包含了每个词的权重分数。在self-attention网络中利用词位置mask的方法完成字词特征的融合。where M is a static word position masking matrix, ε is an infinitesimal value matrix,

is the output of the self-attention network. Multiply M points by an infinitesimal matrix to mask the attention scores obtained at all word positions. After the calculation of the Softmax function, the probability of the word position is all zero, and the position of the character contains the weight score of each word. In the self-attention network, the word position mask method is used to complete the fusion of word features.

经过以上的过程，能够实现词特征的融入，词特征信息作为残差加入到BERT预训练字向量中，最终得到的词汇增强字特征向量为：After the above process, the integration of word features can be realized. The word feature information is added to the BERT pre-training word vector as a residual, and the final word feature vector for vocabulary enhancement is:

C′＝C+g(F_Att)；C'=C+g(F _Att );

其中

表示BERT的预训练字向量特征，使用函数g(*)来移除self-attention网络中的词向量通道来保证C和F_Att向量维度的一致性，从而得到了词汇增强后的字嵌入向量C′。in

Represents the pre-trained word vector feature of BERT, and uses the function g(*) to remove the word vector channel in the self-attention network to ensure the consistency of the C and F _Att vector dimensions, thus obtaining the word embedding vector C after vocabulary enhancement. '.

基于以上的过程，对字-词对向量进行了重新编码、利用特殊的自注意力网络对全局的词向量和字向量进行建模，得到了融合后的字特征向量，实现了全局词特征信息的融合。Based on the above process, the word-word pair vector is re-encoded, and the global word vector and word vector are modeled by a special self-attention network, and the fused word feature vector is obtained, which realizes the global word feature information. fusion.

步骤3：构造命名实体识别模型Step 3: Construct Named Entity Recognition Model

1)BERT结构的计算过程1) The calculation process of the BERT structure

本实施例是基于BERT-BiLSTM网络的改进，对BERT的embedding层做了特征增强，提出了embedding层融入词特征的机制。在步骤2中，得到了经过词汇增强后的字向量表示C′，给定一个长度为n的句子序列s_c＝{c₁,c₂,…,c_n}，接下来在BERT模型中微调字向量C′，词汇增强后的BERT字嵌入向量可以表示为：This embodiment is based on the improvement of the BERT-BiLSTM network, enhances the features of the embedding layer of BERT, and proposes a mechanism for the embedding layer to incorporate word features. In step 2, the word vector representation C′ after lexical enhancement is obtained, given a sentence sequence of length n s _c ={c ₁ ,c ₂ ,...,c _n }, and then fine-tune in the BERT model The word vector C', the BERT word embedding vector after vocabulary enhancement can be expressed as:

E′_i＝C′_i+E_s(i)+E_p(i)E′ _i =C′ _i +E _s (i)+E _p (i)

其中E_s和E_p分别表示分隔向量和位置向量查找表，然后将得到的E′输入到BERT中，每个transformer块的计算公式如下：where E _s and E _p represent the separation vector and position vector lookup table, respectively, and then the obtained E′ is input into BERT. The calculation formula of each transformer block is as follows:

D＝LN(H_k-1+MHA(H_k-1)D=LN(H _k-1 +MHA(H _k-1 )

H_k＝LN(FFN(D)+D)H _k =LN(FFN(D)+D)

其中H_k表示第k层的隐状态输出(H₀＝E′表示底层的字向量)；LN是层归一化函数；MHA是多头自注意力模块；FFN表示两层的前馈神经网络。最后获得了最后一层transformer的隐状态输出向量

将该向量输出到后续的序列标注和解码任务。Where H _k represents the hidden state output of the kth layer (H ₀ =E' represents the word vector of the bottom layer); LN is the layer normalization function; MHA is a multi-head self-attention module; FFN represents a two-layer feedforward neural network. Finally, the hidden state output vector of the last layer of transformer is obtained

This vector is output to subsequent sequence labeling and decoding tasks.

2)LSTM网络的计算过程2) The calculation process of the LSTM network

微调后混合格嵌入向量已经包含了词级别的语义信息，由于在字词融合过程中，更关注于字与词之间的词义信息关联，为了捕捉句子中字与字之间的全局语义信息，更好的提升NER的性能，采用大多数NER模型常用的做法，即用BiLSTM作为本实施例模型的序列标注层。给定一个句子s_c＝{c₁,c₂,…,c_n}，在前述步骤，已经得到了词汇增强后的字符特征表示C′＝{c′₁,c′₂,…,c′_n}，和经过BERT微调后的隐状态输出矩阵

将

输入到一个双向的LSTM网络中，这种网络能够分别从句子的左到右和右到左捕捉语义信息。After fine-tuning, the mixed-case embedding vector already contains the semantic information at the word level. Since in the process of word fusion, more attention is paid to the word meaning information association between words, in order to capture the global semantic information between words in a sentence, more To improve the performance of NER, the common practice of most NER models is adopted, that is, BiLSTM is used as the sequence labeling layer of the model in this example. Given a sentence s _c ={c ₁ ,c ₂ ,...,c _n }, in the previous steps, the character feature representation after lexical enhancement has been obtained C'={c' ₁ ,c' ₂ ,...,c' _n }, and the hidden state output matrix after fine-tuning by BERT

Will

The input is fed into a bidirectional LSTM network that captures semantic information from left-to-right and right-to-left sentences, respectively.

前向的LSTM网络的隐状态输出可以表示为

同样的后向的LSTM网络的输出为

因此BI-LSTM网络的输出的就是sequence-labeling层的输出，它可以表示为：The hidden state output of the forward LSTM network can be expressed as

The output of the same backward LSTM network is

Therefore, the output of the BI-LSTM network is the output of the sequence-labeling layer, which can be expressed as:

其中h_i表示第i个Bi-LSTMs神经元的级联隐状态输出，我们用它来表示c_i的字符级上下文语义表示。where hi represents the cascaded hidden state output of the _ith Bi- _LSTMs neuron, which we use to represent the character-level contextual semantic representation of ci.

3)CRF网络的解码过程3) Decoding process of CRF network

在模型经过序列标注层的计算之后，使用标准的CRF层来预测NER标签。给定网络最后一层的隐状态输出向量H＝{h₁,h₂,…,h_n}，假如y＝{y₁,y₂,…,y_n}表示标签序列，对于一个句子s＝{s₁,s₂,…,s_n}，其对应的标签序列的概率定义如下：After the model goes through the computation of the sequence labeling layer, a standard CRF layer is used to predict NER labels. Given the hidden state output vector H={h ₁ ,h ₂ ,…,h _n } of the last layer of the network, if y={y ₁ ,y ₂ ,…,y _n } represents the label sequence, for a sentence s= {s ₁ ,s ₂ ,…,s _n }, the probability of the corresponding label sequence is defined as follows:

其中y′表示所有标签序列中任意一个标签序列；

表示对应于y_i的网络中可学习的权重参数；

表示对应于y_i-1和y_i之间的偏置量。将负对数似然损失作为模型的损失函数，其可以表示为：where y' represents any tag sequence in all tag sequences;

represents the offset between y _i-1 and y _i . Taking the negative log-likelihood loss as the loss function of the model, it can be expressed as:

步骤4：模型学习过程Step 4: Model Learning Process

本实施例所提出的方法在四个公开数据上分别训练，模型通过优化上述的负对数似然损失来使模型收敛。模型使用了warmup的学习率更新策略，BERT模型参数使用了1e-5的学习率，微调过程很小的学习率就能使模型快速收敛；对于LSTM模型的参数，使用了1e-3的学习率，此外对其他所有的参数设置了1e-4的学习率。四个数据集上，模型在20个epoch内均能收敛。在MSRA数据集上使用了V100显卡进行实验的，其他数据集均在1080Ti显卡上进行。在实验中发现，不同的机器上的结果存在一定的差异，所以实验结果是取多次实验结果的平均值。The method proposed in this embodiment is trained separately on four public data, and the model converges by optimizing the above-mentioned negative log-likelihood loss. The model uses the learning rate update strategy of warmup, the BERT model parameters use a learning rate of 1e-5, and a small learning rate in the fine-tuning process can make the model converge quickly; for the parameters of the LSTM model, a learning rate of 1e-3 is used , and set a learning rate of 1e-4 for all other parameters. On the four datasets, the model converges within 20 epochs. The V100 graphics card was used for experiments on the MSRA dataset, and the other datasets were performed on the 1080Ti graphics card. In the experiment, it is found that there are certain differences in the results on different machines, so the experimental results are the average of multiple experimental results.

本实施例在四个公开的中文命名实体识别数据集上进行评测，与其他模型进行效果比较，通过表格展示数据来说明发明的效果。In this embodiment, the evaluation is performed on four public Chinese named entity recognition data sets, and the effects of other models are compared, and the data is displayed in a table to illustrate the effects of the invention.

数据集介绍：Dataset introduction:

Weibo数据集是从新浪微博网站收集的社交媒体公开数据集，包含四种实体类型：地名、人名、组织和政治相关的实体名。Resume数据集同样来自新浪社交媒体数据，是关于财务简历数据。MSRA和OntoNotes4数据集来源于公众新闻领域，包含训练数据的真实标签。这些数据集的统计信息如表1所示。The Weibo dataset is a social media public dataset collected from the Sina Weibo website and contains four entity types: place names, person names, organization and politically related entity names. The Resume dataset also comes from Sina social media data, which is about financial resume data. The MSRA and OntoNotes4 datasets are derived from the public news domain and contain ground-truth labels for training data. The statistics of these datasets are shown in Table 1.

表1数据集的统计信息表Table 1 Statistics table of datasets

度量指标：Metrics:

这里使用分类模型常用的衡量指标，即F1值来比较其他模型和本发明的识别准确率。首先介绍计算F1值的一些前置条件：TP，即True Positives，表示样本被分为正样本且分配正确；TN，即True Negatives，表示样本被分为样本且分配正确；FP，即FalsePositives，表示样本被分为正样本但分配错误；FN，即False Negatives，表示样本被分为负样本但分配错误。Precision，即精度，表示被正确分配的正样本数占总分配的正样本数的比例，公式为：Here, the commonly used measurement index of classification models, that is, the F1 value, is used to compare the recognition accuracy of other models and the present invention. First, some preconditions for calculating the F1 value are introduced: TP, that is, True Positives, means that the sample is divided into positive samples and the allocation is correct; TN, that is, True Negatives, means that the sample is divided into samples and that the allocation is correct; FP, that is, FalsePositives, means that The samples are classified as positive but wrongly assigned; FN, False Negatives, means that the samples are classified as negative but wrongly assigned. Precision, that is, precision, indicates the proportion of the number of positive samples that are correctly assigned to the total number of positive samples assigned, and the formula is:

Recall，即召回率，表示被正确分配的正样本数占总正样本数的比例，公式为：Recall, the recall rate, indicates the proportion of the number of positive samples that are correctly assigned to the total number of positive samples. The formula is:

F1-Score又称F1分数，是分类问题的一个衡量指标，常作为多分类问题的最终指标，它是精度和召回率的调和平均数。对于单个类别的F1分数，可使用如下公式计算F1-Score, also known as F1 score, is a measure of classification problems and is often used as the final indicator of multi-classification problems. It is the harmonic mean of precision and recall. For a single category F1 score, it can be calculated using the following formula

效果说明：Effect description:

四个数据上的F1值如下表所示。本实施例不仅对比了使用BERT预训练字向量的方法，也对比了其他的经典词汇增强SOTA方法。在保证使用同样的预训练字向量和词向量的基础上，本实施例所提出的方法在四个公开数据集Weibo、Resume、MSRA、Ontonote4上的F1值分别达到了71.88％、96.22％、95.72％和81.75％的表现。与经典中文NER模型Lattice-LSTM、LR-CNN和FLAT上的表现相比，本实施例提出的方法在四个数据集上分别取得了8.46％、0.69％、5％和1.37％的提升。相比于使用BERT预训练字向量的NER模型，如Softlexicon-BERT、FLAT-BERT、LEBERT等模型，在Resume、MSRA和OnNote4数据集上，本实施例所提出的方法实现了0.28％、0.12％和0.41％的轻微改进，但是在Weibo数据集上的F1分数显著优于Softlexicon BERT模型，提高了1.64％。本实施例是基于BERT-BiLSTM改进的，相比于未使用词汇增强网络的BERT基准模型，本实施例在四个数据集上分别有4.55％、0.54％、1.82％和0.91％的性能提升，这说明通过词汇增强技术提升字向量的特征表示是一种有效提升中NER性能的方法。同时对比于其他词汇增强方法，本实施例所提出的特征融合框架(MELSN)能够有效的融合更丰富的词汇语义特征，借助预训练模型BERT的微调机制，词汇增强后的字特征表示包含更多的词汇语义。参见表2，对比实验结果表明，本实施例可以更好地利用字-词格结构实现词汇增强，也说明本实施例提出的方法在中文命名实体识别任务上的高效性。The F1 values on the four data are shown in the table below. This embodiment not only compares the method of using BERT pre-training word vector, but also compares other classic vocabulary enhancement SOTA methods. On the basis of ensuring the use of the same pre-trained word vectors and word vectors, the F1 values of the method proposed in this example on the four public datasets Weibo, Resume, MSRA, and Ontonote4 reach 71.88%, 96.22%, and 95.72 respectively. % and 81.75% performance. Compared with the performance on the classic Chinese NER models Lattice-LSTM, LR-CNN and FLAT, the method proposed in this example achieves 8.46%, 0.69%, 5% and 1.37% improvement on the four datasets, respectively. Compared with NER models that use BERT pre-trained word vectors, such as Softlexicon-BERT, FLAT-BERT, LEBERT and other models, on the Resume, MSRA and OnNote4 datasets, the method proposed in this example achieves 0.28%, 0.12% and a slight improvement of 0.41%, but the F1 score on the Weibo dataset significantly outperforms the Softlexicon BERT model by 1.64%. This embodiment is improved based on BERT-BiLSTM. Compared with the BERT benchmark model without vocabulary enhancement network, this embodiment has 4.55%, 0.54%, 1.82% and 0.91% performance improvements on the four datasets, respectively. This shows that improving the feature representation of word vectors through lexical enhancement techniques is an effective method to improve the performance of mid-NER. At the same time, compared with other vocabulary enhancement methods, the feature fusion framework (MELSN) proposed in this embodiment can effectively integrate richer vocabulary semantic features. With the fine-tuning mechanism of the pre-training model BERT, the word feature representation after vocabulary enhancement contains more lexical semantics. Referring to Table 2, the comparative experimental results show that this embodiment can better utilize the word-word lattice structure to achieve vocabulary enhancement, which also illustrates the efficiency of the method proposed in this embodiment in the task of Chinese named entity recognition.

表2对比实验结果表Table 2 Comparative experiment result table

以上仅是本发明的优选实施方式，本发明的保护范围并不仅局限于上述实施例，凡属于本发明思路下的技术方案均属于本发明的保护范围。应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理前提下的若干改进和润饰，应视为本发明的保护范围。The above are only preferred embodiments of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions that belong to the idea of the present invention belong to the protection scope of the present invention. It should be pointed out that for those skilled in the art, some improvements and modifications without departing from the principle of the present invention should be regarded as the protection scope of the present invention.

Claims

1. A named entity recognition method based on a mixed lattice self-attention network is characterized by comprising the following steps:

s1, searching words composed of continuous words in the input sentence in the dictionary, combining into a single multidimensional vector through position alternate mapping, and coding the sentence characteristic vector represented by the word pair into a matrix with fixed dimensionality by adopting a mixed word lattice coding mode to obtain the word vector representation of a corresponding mixed lattice structure;

s2, constructing a corresponding self-attention network based on the word vectors of the mixed lattice structure generated in the step S1 to capture the influence of the word vectors in the vectors on the word vectors so as to enhance the feature representation of each word vector;

s3, fusing word features at the Embedding layer of BERT, and learning to obtain better word vector representation through a fine tuning learning process; according to a BilSTM-CRF network, realizing an entity sequence labeling task and a decoding process in entity identification, completing modeling of character features after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network;

and S4, training the entity recognition model based on the mixed lattice self-attention network on the data set.

2. The method for recognizing named entities based on mixed-lattice self-attention network as claimed in claim 1, wherein in step S1, the process of encoding sentence feature vectors represented by word pairs into a matrix with fixed dimension by using mixed-word lattice encoding to obtain word vector representations with corresponding mixed-lattice structure comprises the following steps:

s11, a sentence S is given_c＝{c₁,c₂,…,c_nBy loading with prestageThe trained BERT weights to obtain a sentence s_cWord feature vector representation of

Wherein

c_iDenotes s_cOf (a), n represents the word length of s, e^BA lookup table representing a BERT pre-training word vector;

s12, giving a Chinese dictionary L, constructing a Trie dictionary tree, traversing the nodes of the tree, and obtaining the vocabulary matched with each word;

s13, all matched words are grouped according to BMES marks, namely for the character c_iWord set B (c)_i) Consisting of matching words starting with it, set M (c)_i) From c_iSet E (c) for matching word composition of its internal characters_i) By c_iMatching word composition at the end, set S (c)_i) From c_iThe single character word composition of (1); sentence s_cIn each word c_iWord set w of_iExpressed as:

w_i＝{e^w(B(c_i)),e^w(M(c_i)),e^w(E(c_i)),e^w(S(c_i))}；

wherein e^wA lookup table of word vectors representing pre-training;

s14, setting two learnable nonlinear full-connection layers to connect w_iIs raised to a sum word vector

When BERT is in fine adjustment, learning the two layers of weights to enable pre-trained word feature vectors to be mapped to a semantic feature space of the BERT; the processed word feature vector is represented as follows:

wherein W₁∈(d_c×d_c),W₂∈(d_c×d_w) Is a learnable weight matrix, b₁And b₂Is a corresponding offset, d_cDimension, d, representing the BERT word vector_wA dimension representing a pre-training word vector;

s15, converting the word feature vector v_i ^wAs the input of the feature fusion model, according to the corresponding relation between the words and the word sets, the feature of each word-word pair is expressed as:

s16, representing the characteristics of the word-word pairs as follows:

wherein

Representing a vector concatenator.

3. The method for named entity recognition based on mixed-lattice self-attention network as claimed in claim 2, wherein the step S2 of constructing a corresponding self-attention network based on the word vectors of the mixed-lattice structure generated in step S1 to capture the influence of the word vectors in the vectors on the word vectors, so as to enhance the feature representation of each word vector comprises the following steps:

s21, designing Mixed-lattice self-attention network to capture the association between word features, the self-attention network will mix word code vector V^MEAnd a word position shielding matrix M is used as an input of the enhancement network, global word vectors and word vectors are modeled through the self-attention network, so that the model learns word meaning correlation weights among words, and the Q, K, V matrix is calculated as follows:

[Q,K,V]＝[W_qV^ME,W_kV^ME,W_vV^ME]；

wherein

Is a learnable weight matrix, and d_e＝d_c+d_w(ii) a Q, K, V matrix is query item matrix, key item matrix corresponding to query item and value item matrix to be weighted average; d_eDimension, d, representing mixed-lattice vector_cDimension, d, representing word vector_wThe dimension of the word vector;

s22, using the dot product operation as a formula for calculating the similarity score:

F_Att＝Softmax(S_Att+εM)V；

where M is a static word-position mask matrix, ε is a matrix whose value is infinitesimal,

is an output from the attention network; wherein S_AttDenotes the normalized attention score, K^TRepresents the transpose of matrix K;

s23, adding the word feature information as a residual into the BERT pre-training word vector, and obtaining the word enhanced word feature vector as follows:

C′＝C+g(F_Att)；

wherein

Representing the pre-training word vector features of BERT, the function g (. + -) is used to remove the word vector path in self-entry network to ensure C and F_AttAnd (5) obtaining word embedded vector C' after vocabulary enhancement by the consistency of vector dimensions.

4. The method for named entity recognition based on hybride self-attention network of claim 3, wherein the step S3 of constructing the entity recognition model based on hybride self-attention network comprises the following steps:

s31, a sentence sequence S with the length of n is given_c＝{c₁,c₂,…,c_nThe vocabulary enhanced word vector is denoted as C '═ C'₁,c′₂,…,c′_nFine-tuning a word vector C' in the BERT model, the vocabulary-enhanced BERT word-embedding vector is represented as:

E′_i＝C′_i+E_s(i)+E_p(i)；

wherein E_sAnd E_pRespectively representing a separation vector and a position vector lookup table; i denotes a character sequence s of length n_cThe ith character in (a);

s32, inputting the obtained E' into BERT, wherein the calculation formula of each transform block is as follows:

D＝LN(H_k-1+MHA(H_k-1)；

H_k＝LN(FFN(D)+D)；

wherein H_kRepresents the hidden state output of the k-th layer, H₀E' represents the underlying word vector; LN is the layer normalization function; MHA is a multi-headed self-attention module; FFN represents a two-layer feedforward neural network; d represents an output vector after the multi-head attention module is normalized;

s33, obtaining the hidden state output vector of the last layer of transformer

Will be provided with

Inputting the semantic information into a bidirectional LSTM network, and capturing semantic information from left to right and from right to left of a sentence respectively; the hidden state output of the forward LSTM network is represented as

The backward LSTM network outputs

The output of the BI-LSTM network is the output of the sequence-labeling layer, expressed as:

wherein h is_iRepresents the cascade hidden state output of the ith Bi-LSTMs neuron and is used for representing c_iThe character-level context semantic representation of;

s34, predicting the NER label by using standard CRF layer, and giving hidden state output vector H ═ H of last layer of network₁,h₂,…,h_nIf y is equal to { y }₁,y₂,…,y_nDenotes a sequence of labels, s ═ s for a sentence₁,s₂,…,s_nThe probability of its corresponding tag sequence is defined as follows:

wherein y' represents any one of all tag sequences;

the representation corresponds to y_iA learnable weight parameter in the network of (a);

representation corresponds to y_i-1And y_iAn offset between;

respectively representing model weight parameters and offset under any possible label y';

s35, the negative log-likelihood loss as a loss function of the model is expressed as:

5. a named entity recognition device based on a mixed lattice self-attention network is characterized by comprising a mixed lattice structure coding module, a vocabulary enhancing module, a sequence marking and decoding module and a model training module;

the mixed lattice structure coding module is used for searching words consisting of continuous words in input sentences in a dictionary, alternately mapping and merging the words into a single multidimensional vector through positions, and coding sentence characteristic vectors represented by word pairs into a matrix with fixed dimensionality by adopting a mixed word lattice coding mode to obtain word vector representation of a corresponding mixed lattice structure;

the vocabulary enhancement module is used for constructing a corresponding self-attention network based on the generated word vectors of the mixed lattice structure so as to capture the influence of the word vectors in the vectors on the word vectors, thereby enhancing the feature representation of each word vector;

the sequence labeling and decoding module is used for fusing word features at an Embedding layer of BERT and learning to obtain better word vector representation through a fine tuning learning process; according to a BilSTM-CRF network, realizing an entity sequence labeling task and a decoding process in entity identification, completing modeling of character features after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network;

the model training module is used for training the entity recognition model based on the hybrid grid self-attention network on the data set.