CN111353040A

CN111353040A - GRU-based attribute level emotion analysis method

Info

Publication number: CN111353040A
Application number: CN201910459539.6A
Authority: CN
Inventors: 邢永平; 禹晶; 肖创柏
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2019-05-29
Filing date: 2019-05-29
Publication date: 2020-06-30

Abstract

The invention discloses an attribute-level sentiment analysis method. Sentiment analysis is a basic task in natural language processing, and attribute-level sentiment analysis is an important topic of sentiment analysis. Different words in a sentence have different effects on the sentiment polarity of an aspect in the sentence. How to model the relationship between the attribute and the words in the sentence and the meaning of the whole sentence is the key to solving this problem. In this paper, we will use two recurrent networks to model sentence information and introduce an attention mechanism to fuse attribute information to achieve better results. In this paper, experiments on public datasets show that the algorithm proposed in this paper achieves better results without the need for complicated feature engineering.

Description

Attribute-level sentiment analysis method based on GRU

技术领域technical field

本发明涉及互联网领域，更具体地说，涉及基于GRU的属性级别情感分析方法。The present invention relates to the field of Internet, and more particularly, to an attribute-level sentiment analysis method based on GRU.

背景技术Background technique

随着互联网的快速发展，文本信息越来越多，如何从海量文本信息中获取有用信息变得越来越重要，这些海量文本信息也客观促进自然语言处理的发展，而深度学习为自然语言处理带来了新的方向。情感分析(即意见挖掘)是自然语言处理中一个基础却很重要的任务。企业可以利用客户对产品的评论信息来及时的获取反馈从而为决策提供参考。因此近年来如何在海量的文本数据中提取情感信息显得成为自然语言处理的一个重要研究课题。With the rapid development of the Internet and more and more text information, how to obtain useful information from massive text information has become more and more important. These massive text information also objectively promote the development of natural language processing, and deep learning is a natural language processing technology. brought a new direction. Sentiment analysis (i.e. opinion mining) is a fundamental but important task in natural language processing. Enterprises can use customer comments on products to obtain timely feedback to provide reference for decision-making. Therefore, how to extract emotional information from massive text data has become an important research topic in natural language processing in recent years.

目前主要的文本情感分析研究主要是基于情感词典和基于机器学习。基于情感词典的方法依赖于情感词典，情感词典对于情感分析具有大的影响，杨鼎等基于情感词典对文本进行处理和表示从而构建出基于朴素贝叶斯理论的分类器。另一种方法是基于机器学习的方法。机器学习的方法通过对人工标定的数据进行训练而得到一个情感分析分类器，通过实验证明了支持向量机的优良分类性能。这两种方法都需要人工标记数据从而完成情感词典构建和特征工程，这些任务繁琐且复杂而深度学习算法能够很好地解决这一问题。近年来深度学习在自然语言处理方面取得巨大的成功，比如机器翻译，问答系统。在情感分析领域也有应用， Socher等提出了基于半监督递归自动编码机RAE)的深度学习方法来实现文本情感分类；Jurgovsky等利用卷积神经网络(CNN)实现了文本情感分类。文本情感分析可以分为篇章级、句子级以及单词级。本文主要研究的是基于属性(aspect) 的情感分析。这是由于在同一句子中对不同的aspect其情感极性有可能是不相同的，比如在“Thevoice quality of this phone is not good,but the battery life is long.”这个句子中对于quality来说这句话的评价是负面的，然而对battery life这个来说是正面的。Wang等提出AE-LSTM,AT-LSTM,和AEAT-LSTM循环网络算法用于aspect粒度的情感分析，其将aspect信息融合到长短期记忆网络LSTM中以提高分类精度。SVM-dep算法将特征分为和属性aspect相关的特征和aspect 无关的特征，分别提取出来完成了基于属性级别的情感分析，其精度由于不包含属性特征的支持向量机分类器。At present, the main text sentiment analysis research is mainly based on sentiment dictionary and machine learning. The method based on sentiment dictionary relies on sentiment dictionary, which has a great influence on sentiment analysis. Yang Ding et al. processed and represented text based on sentiment dictionary to construct a classifier based on Naive Bayes theory. Another approach is a machine learning based approach. The machine learning method obtains a sentiment analysis classifier by training the manually calibrated data, and the excellent classification performance of the support vector machine is proved by experiments. Both methods require manual labeling of data for sentiment dictionary construction and feature engineering, which are tedious and complex tasks that deep learning algorithms can handle well. In recent years, deep learning has achieved great success in natural language processing, such as machine translation, question answering systems. It also has applications in the field of sentiment analysis. Socher et al. proposed a deep learning method based on semi-supervised recurrent auto-encoder (RAE) to achieve text sentiment classification; Jurgovsky et al. used convolutional neural network (CNN) to achieve text sentiment classification. Text sentiment analysis can be divided into chapter level, sentence level and word level. This paper mainly studies the sentiment analysis based on aspect. This is because the emotional polarity of different aspects may be different in the same sentence. For example, in the sentence "The voice quality of this phone is not good, but the battery life is long." The evaluation of the sentence is negative, but it is positive for battery life. Wang et al. proposed AE-LSTM, AT-LSTM, and AEAT-LSTM recurrent network algorithms for aspect-granular sentiment analysis, which integrated aspect information into long short-term memory network LSTM to improve classification accuracy. The SVM-dep algorithm divides the features into the features related to the attribute aspect and the features not related to the aspect, and extracts them respectively to complete the sentiment analysis based on the attribute level.

注意力机制是在信息处理时选择性地集中于某些重要的信息的一种机制，而忽略和关注目标意义相关性较弱的一种信息处理机制，它强调在信息处理时更关注信息的本质方面的信息，它将有限的资源集中于重要的信息的处理，从而取得了巨大的成功。注意力(Attention)机制在图像识别、自动翻译等领域已经取得了巨大的成功。结合本文的主题，在处理基于属性情感分析的时候，可以更加关注和属性有关的信息从而提高情感分类的准确度。The attention mechanism is a mechanism that selectively focuses on some important information during information processing, while ignoring and paying attention to an information processing mechanism that is less relevant to the meaning of the target. Essentially information, it has achieved great success by concentrating limited resources on the processing of important information. Attention mechanism has achieved great success in image recognition, automatic translation and other fields. Combined with the theme of this paper, when dealing with attribute-based sentiment analysis, we can pay more attention to attribute-related information to improve the accuracy of sentiment classification.

循环网络(RNN)因为其网络记忆性能够处理上下文信息而被广泛应用于自然语言处理中，典型的循环网络有长短期记忆网络(LSTM)、门控循环单元(GRU) 和MUT网络等。本文将提出基于GRU网络在属性粒度情感分析的算法，然后通过注意力机制将属性信息融合到模型中，使得算法模型更能够关注到属性对情感分类的影响，从而提高情感分类的精度。Recurrent network (RNN) is widely used in natural language processing because of its network memory ability to process contextual information. Typical recurrent networks include long short-term memory network (LSTM), gated recurrent unit (GRU) and MUT network. This paper proposes an algorithm for attribute granular sentiment analysis based on GRU network, and then integrates attribute information into the model through the attention mechanism, so that the algorithm model can pay more attention to the influence of attributes on sentiment classification, thereby improving the accuracy of sentiment classification.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明的目的在于提供了一种基于GRU网络的属性级情感分析模型和方法，本发明基于Att-CGRU的属性级别情感分类算法，以实现情感分类精度的提高。In view of this, the purpose of the present invention is to provide an attribute-level sentiment analysis model and method based on GRU network. The present invention is based on the attribute-level sentiment classification algorithm of Att-CGRU to improve sentiment classification accuracy.

为了实现上述目的，本发明设计的基于GRU网络的属性级情感分析模型包括如下：In order to achieve the above purpose, the attribute-level sentiment analysis model based on the GRU network designed by the present invention includes the following:

在Att-CGRU模型中，将通过注意力机制的引入来体现属性在对于整个句子的情感极性具有很重要的影响。在序列问题的处理中，编码解码(encoder-decoder) 是一种很常用的模型，通过在编解码模型根据不同的算法以及任务目标来对编码输出的隐状态向量分配不同的权重，抽取出能够尽可能表征输入数据的向量表示以改善算法模型性能，这一过程实质上是将有限的资源集中于和目标任务相关度更高的信息上来提高算法性能。Att-CGRU模型具体结构见说明书附图1。模型包括五个部分即输入层、嵌入层、GRU层、attention层和输出层。输入层将短文本即句子输入到模型中；嵌入层将句子中的每一个词映射成一个向量；GRU层利用从词嵌入中来获取特征信息；attention层实现注意力机制，它将会通过权重计算把词一级的特征信息融合成句子级的特征信息而产生一个句子特征向量；最终将句子特征向量进行分类。In the Att-CGRU model, the introduction of the attention mechanism to reflect the attribute has a very important impact on the emotional polarity of the entire sentence. In the processing of sequence problems, the encoder-decoder is a very commonly used model. By assigning different weights to the hidden state vector of the encoding output according to different algorithms and task goals in the encoder-decoder model, extracting To improve the performance of the algorithm model by characterizing the vector representation of the input data as much as possible, this process essentially concentrates limited resources on information that is more relevant to the target task to improve the performance of the algorithm. The specific structure of the Att-CGRU model is shown in Figure 1 of the description. The model consists of five parts namely input layer, embedding layer, GRU layer, attention layer and output layer. The input layer inputs short texts or sentences into the model; the embedding layer maps each word in the sentence into a vector; the GRU layer uses the word embedding to obtain feature information; the attention layer implements the attention mechanism, which will pass weights Calculate and fuse word-level feature information into sentence-level feature information to generate a sentence feature vector; finally classify the sentence feature vector.

1.1输入层1.1 Input layer

在输入层输入每一个需要进行情感极性分类的句子，假设句子长度为T，则句子可以表示为s＝{x₁,x₂,...,x_T}，x_i表示句子中的第i个单词。In the input layer, input each sentence that needs to be classified by sentiment polarity. Assuming that the sentence length is T, the sentence can be expressed as s={x ₁ ,x ₂ ,...,x _T }, and x _i represents the first sentence in the sentence. i words.

1.2嵌入层1.2 Embedding layer

在从输入层获得的一个包含T个词的句子s＝{x₁,x₂,...,x_T}，后每一个词在嵌入层得到其对应的词向量e_i In a sentence s={x ₁ ,x ₂ ,...,x _T } obtained from the input layer containing T words, each word gets its corresponding word vector e _i in the embedding layer

首先从词嵌入矩阵

中获得每一个词的词向量，这里V是词表的长度，d^w是可以指定的词向量维数，则有First from the word embedding matrix

The word vector of each word is obtained in , where V is the length of the vocabulary, and d ^w is the dimension of the word vector that can be specified, then there are

emb_i＝W^wrdvⁱ (1)emb _i =W ^wrd v ⁱ (1)

其中vⁱ是一个长度为|V|的向量，其中在i处为1，其他处为0。可以同样得出aspect的词向量emb_asp，当句子中aspect为多个单词的时候，将每个单词的词向量相同维度的值加起来得到aspect的词向量。然后将emb_i和emb_asp拼接起来得到最终的词向量e_i：where v ⁱ is a vector of length |V|, which is 1 at i and 0 elsewhere. The word vector emb _asp of aspect can also be obtained. When there are multiple words in the sentence, the word vector of aspect is obtained by adding up the values of the same dimension of the word vector of each word. Then concatenate emb _i and emb _asp to get the final word vector e _i :

e_i＝[emb_i:emb_asp] (2)e _i = [emb _i : emb _asp ] (2)

最后将e＝{e₁,e₂,...,e_T}输入到下一层。Finally, e={e ₁ ,e ₂ ,...,e _T } is input to the next layer.

1.3 GRU层1.3 GRU layer

在GRU层中，将会以属性为分界点，将句子分为左右部分去对属性上下文建模，其结构如图1，其中{x_l+1,x_l+2,...,x_r-1}表示aspect，{x₁,x₂,...,x_l}表示句子中属性以前的单词，{x_r-1,x_r-2,...,x_T}表示属性以后的单词。将左右两个序列输入到左右两个网络后隐藏层分别得到{h₁,h₂,...,h_r-1}和{h_l+1,h_l+2,...,h_T}。In the GRU layer, the attribute is used as the dividing point, and the sentence is divided into left and right parts to model the attribute context. Its structure is shown in Figure 1, where {x _l+1 ,x _l+2 ,...,x _{r -1} } represents aspect, {x ₁ ,x ₂ ,...,x _l } represents the word before the attribute in the sentence, {x _r-1 ,x _r-2 ,...,x _T } represents the word after the attribute word. After inputting the left and right sequences into the left and right networks, the hidden layers get {h ₁ ,h ₂ ,...,h _r-1 } and {h _l+1 ,h _l+2 ,...,h _T respectively }.

1.4 attention层1.4 attention layer

在这个模型中引入注意力机制来获得更好的分类效果，这是由于句子中前后两部分的不同词和属性有不同的联系，将更多的来关注和属性联系紧密的信息。注意力机制的实现如下：The attention mechanism is introduced into this model to obtain better classification results, because different words and attributes in the front and back parts of the sentence have different connections, and more attention will be paid to the information that is closely related to the attributes. The attention mechanism is implemented as follows:

a_t＝softmax(w^TM) (4)a _t =softmax(w ^T M) (4)

r＝Ha_t (5)r=Ha _t (5)

这里a_t表示的是注意力权重系数，

表示重复e_asp多次至和H的维度保持一致,H 是模型中隐藏层输出组成的矩阵，r表示的是加权后的表示句子含义的向量,W_h、 W_v、w是参数矩阵，然后得到能够最终表针句子信息的向量oHere a _t represents the attention weight coefficient,

Represents repeating e _asp multiple times to be consistent with the dimension of H, H is the matrix composed of the output of the hidden layer in the model, r represents the weighted vector representing the meaning of the sentence, W _h , W _v , w are the parameter matrix, and then Get the vector o that can finally pin the sentence information

o＝tanh(W_pr+W_xh) (6)o ₌ tanh(Wpr+ _Wxh ) (6)

h表示h_r-1和h_l+1向量的和。h represents the sum of h _r-1 and h _l+1 vectors.

1.5输出层1.5 Output layer

最后将注意力层的输出o输入到分类器Finally, the output o of the attention layer is input to the classifier

实现情感的极性分类，其中W_o和b_o是要训练得到的参数矩阵。Implements sentiment polarity classification, where _Wo and _bo are the parameter matrices to be trained.

本方法的具体实验步骤如下：The specific experimental steps of this method are as follows:

步骤S1、首先将本发明中所用的收集于推特的数据集输入到Att-CGRU模型的输入层，Step S1, first input the data set collected on Twitter used in the present invention into the input layer of the Att-CGRU model,

步骤S2、将S1得到的数据输入到嵌入层，得到输入句子中每个词的词向量，Step S2, input the data obtained in S1 into the embedding layer to obtain the word vector of each word in the input sentence,

步骤S3、在GRU层中通过S2的方式得到句子中每个词的词向量后，以属性词{x_l+1,x_l+2,...,x_r-1}为分界点将左边{x₁,x₂,...,x_l}单词的词向量和右边 {x_r-1,x_r-2,...,x_T}单词的词向量输入到两个左右两个GRU网络分别对属性词的上下文建模，从隐藏层分别得到输出{h₁,h₂,...,h_r-1}和{h_l+1,h_l+2,...,h_T}。Step S3: After obtaining the word vector of each word in the sentence in the GRU layer by means of S2, take the attribute word {x _l+1 ,x _l+2 ,...,x _r-1 } as the dividing point to divide the left The word vector of the word {x ₁ ,x ₂ ,...,x _l } and the word vector of the word on the right {x _r-1 ,x _r-2 ,...,x _T } are input to two left and right GRUs The network models the context of the attribute word separately, and obtains the outputs {h ₁ ,h ₂ ,...,h _r-1 } and {h _l+1 ,h _l+2 ,...,h _T from the hidden layer, respectively }.

步骤S4、根据S4的输出，按照以下公式来计算能代表句子信息的向量o，具体公式如下：Step S4, according to the output of S4, according to the following formula to calculate the vector o that can represent the sentence information, the specific formula is as follows:

a_t＝softmax(w^TM)a _t =softmax(w ^T M)

r＝Ha_t _r =Hat

这里r表示的是加权后的能表征句子含义的向量,a_t表示的是注意力权重系数，其由将w^TM输入到softmax函数后得出，M表示一个由模型GRU层中隐藏层的输出组成的矩阵H得来的向量，

表示重复属性词词向量e_asp多次至和H的维度保持一致,H是模型中隐藏层输出组成的矩阵,tanh代表tanh函数，W_h、W_v、 w是参数矩阵。最终得到能够最终表针句子信息的向量oHere r represents the weighted vector that can represent the meaning of the sentence, and a _t represents the attention weight coefficient, which is obtained by inputting w ^T M into the softmax function, and M represents a hidden layer in the GRU layer of the model. The vector obtained from the matrix H composed of the output,

Indicates that the word vector e _asp is repeated several times until the dimension of H is consistent. H is the matrix composed of the output of the hidden layer in the model, tanh represents the tanh function, and W _h , W _v , and w are the parameter matrices. Finally, the vector o that can finally pin the sentence information is obtained

o＝tanh(W_pr+W_xh)o ₌ tanh(Wpr+ _Wxh )

h表示h_r-1和h_l+1向量的和，h_r-1表示左GRU网络中的第r-1个词对应的隐藏层输出，h_l+1表示右GRU网络中的第l+1个词对应的隐藏层输出，W_p和W_x表示参数矩阵。h represents the sum of h _r-1 and h _l+1 vectors, h _r-1 represents the output of the hidden layer corresponding to the r-1th word in the left GRU network, and h _l+1 represents the l+th word in the right GRU network The output of the hidden layer corresponding to 1 word, W _p and W _x represent the parameter matrix.

步骤S5、输出层是将能够表针句子信息的向量o输入到softmax函数得到预测的情感极性

具体由

得出，W_o和b_o都是参数矩阵。Step S5, the output layer is to input the vector o that can indicate sentence information into the softmax function to obtain the predicted sentiment polarity.

Specifically by

It is concluded that both W _o and b _o are parameter matrices.

步骤6、根据S5的输出和每个句子对应的实际分类y计算损失函数值lossStep 6. Calculate the loss function value loss according to the output of S5 and the actual classification y corresponding to each sentence

其中λ是正则化系数，并通过误差反向传播算法训练迭代至Accuracy取得最大值，误差反向传播算法中的优化算法是以初始化系数为0.01的AdaGrad算法。Among them, λ is the regularization coefficient, and the error back-propagation algorithm is trained to iterate until the Accuracy gets the maximum value. The optimization algorithm in the error back-propagation algorithm is the AdaGrad algorithm with an initialization coefficient of 0.01.

与现有技术相比较，本发明具有如下技术效果。Compared with the prior art, the present invention has the following technical effects.

本方法中分别和传统的机器学习方法(包括支持向量机算法、SVM-dep算法) 以及深度学习的方法(AdaRNN-w/E、AdaRNNcomb、TC-LSTM)分别进行了对比实验，评价模型时以准确率(Accuracy)来评价各种模型，结果如下表所示：This method is compared with traditional machine learning methods (including support vector machine algorithm, SVM-dep algorithm) and deep learning methods (AdaRNN-w/E, AdaRNNcomb, TC-LSTM). Accuracy rate (Accuracy) to evaluate various models, the results are shown in the following table:

表1实验结果Table 1 Experimental results

附图说明Description of drawings

图1是Att-CGRU模型具体结构图。Figure 1 is a specific structural diagram of the Att-CGRU model.

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，在图1Att-CGRU模型结构图中包括五个部分即输入层、嵌入层、GRU层、attention层和输出层。输入层将短文本即句子输入到模型中；嵌入层将句子中的每一个词映射成一个向量；GRU 层利用从词嵌入中来获取特征信息；attention层实现注意力机制，它将会通过权重计算把词一级的特征信息融合成句子级的特征信息而产生一个句子特征向量；最终将句子特征向量进行分类。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. In the structural diagram of the Att-CGRU model in FIG. The five parts are input layer, embedding layer, GRU layer, attention layer and output layer. The input layer inputs short texts or sentences into the model; the embedding layer maps each word in the sentence into a vector; the GRU layer uses the word embedding to obtain feature information; the attention layer implements the attention mechanism, which will pass weights Calculate and fuse word-level feature information into sentence-level feature information to generate a sentence feature vector; finally classify the sentence feature vector.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

实施本发明时，首先要收集数据集，本发明所用的数据集是收集于推特的一个基本数据集。When implementing the present invention, a data set must be collected first, and the data set used in the present invention is a basic data set collected on Twitter.

本算法的具体实验步骤如下：The specific experimental steps of this algorithm are as follows:

步骤S1、本文所用的数据集是收集于推特的一个基本数据集。每一个训练和测试数据已经人工标定。训练数据集用来训练模型，测试数据集用来测试模型性能。训练数据集有6248个句子，测试数据集有692个句子。测试和训练数据集中正面、负面和中性数据各自占25％、25％和50％Step S1, the data set used in this paper is a basic data set collected on Twitter. Each training and testing data has been manually calibrated. The training dataset is used to train the model, and the test dataset is used to test the model performance. The training dataset has 6248 sentences and the test dataset has 692 sentences. 25%, 25%, and 50% of positive, negative, and neutral data in test and training datasets, respectively

步骤S2、本文中的模型中包括了五个部分即输入层、嵌入层、GRU层、 attention层和输出层。输入层将短文本即句子输入到模型中，句子可以表示为 s＝{x₁,x₂,...,x_T}其中x_i表示组成句子的第i个单词，T表示句子的长度。嵌入层将句子中的每一个单词x_i根据词向量词典映射成词向量e_i＝[emb_i:emb_asp]，其中emb_i表示第i个单词在词典中对应的词向量，emb_asp表示的是属性词的词向量，当属性词由多个词组成时，取这几个词的词向量的均值。GRU层将利用从嵌入层中来获取语义特征信息的基础上将以属性为分界点，把句子分为左右部分对属性上下文建模，其结构如图1，其中{x_l+1,x_l+2,...,x_r-1}表示属性aspect，{x₁,x₂,...,x_l}表示句子中属性以前的单词，{x_r-1,x_r-2,...,x_T}表示属性以后的单词。将左右两个序列输入到左右两个GRU网络后隐藏层分别得到{h₁,h₂,...,h_r-1}和{h_l+1,h_l+2,...,h_T}。attention层实现注意力机制，它通过权重计算把词一级的特征信息融合成句子级的特征信息而产生一个句子特征向量最终将句子特征向量进行分类，其具体实现公示如下Step S2, the model in this paper includes five parts, namely the input layer, the embedding layer, the GRU layer, the attention layer and the output layer. The input layer inputs short texts, namely sentences, into the model. The sentences can be represented as s={x ₁ ,x ₂ ,...,x _T } where x _i represents the ith word that composes the sentence, and T represents the length of the sentence. The embedding layer maps each word x _i in the sentence into a word vector e _i =[emb _i : emb _asp ] according to the word vector dictionary, where emb _i represents the word vector corresponding to the i-th word in the dictionary, and emb _asp represents the word vector is the word vector of the attribute word. When the attribute word consists of multiple words, take the mean of the word vectors of these words. The GRU layer will use the attribute as the demarcation point to obtain the semantic feature information from the embedding layer, and divide the sentence into left and right parts to model the attribute context. Its structure is shown in Figure 1, where {x _l+1 , x _{l +2} ,...,x _r-1 } represents the attribute aspect, {x ₁ ,x ₂ ,...,x _l } represents the word before the attribute in the sentence, {x _r-1 ,x _r-2 ,. ..,x _T } represents the word after the attribute. After inputting the left and right sequences into the left and right GRU networks, the hidden layers get {h ₁ ,h ₂ ,...,h _r-1 } and {h _l+1 ,h _l+2 ,...,h respectively _T }. The attention layer implements the attention mechanism. It fuses the word-level feature information into sentence-level feature information through weight calculation to generate a sentence feature vector and finally classifies the sentence feature vector. The specific implementation is as follows

a_t＝softmax(w^TM)a _t =softmax(w ^T M)

r＝Ha_t _r =Hat

表示重复属性词词向量e_asp多次至和H的维度保持一致,H是模型中隐藏层输出组成的矩阵,tanh代表tanh函数，W_h、W_v、w是参数矩阵。最终得到能够最终表针句子信息的向量o；Here r represents the weighted vector that can represent the meaning of the sentence, and a _t represents the attention weight coefficient, which is obtained by inputting w ^T M into the softmax function, and M represents a hidden layer in the GRU layer of the model. The vector obtained from the matrix H composed of the output,

Indicates that the word vector e _asp is repeated several times until the dimension of H is consistent. H is the matrix composed of the output of the hidden layer in the model, tanh represents the tanh function, and W _h , W _v , and w are the parameter matrices. Finally, the vector o that can finally indicate the sentence information is obtained;

o＝tanh(W_pr+W_xh)o ₌ tanh(Wpr+ _Wxh )

h表示h_r-1和h_l+1向量的和，h_r-1表示左GRU网络中的第r-1个词对应的隐藏层输出，h_l+1表示右GRU网络中的第l+1个词对应的隐藏层输出，W_p和W_x表示参数矩阵。输出层是将能够表针句子信息的向量o输入到softmax函数得到预测的情感极性

具体由

得出，W_o和b_o都是参数矩阵。h represents the sum of h _r-1 and h _l+1 vectors, h _r-1 represents the output of the hidden layer corresponding to the r-1th word in the left GRU network, and h _l+1 represents the l+th word in the right GRU network The output of the hidden layer corresponding to 1 word, W _p and W _x represent the parameter matrix. The output layer is to input the vector o that can indicate sentence information to the softmax function to obtain the predicted sentiment polarity

Specifically by

It is concluded that both W _o and b _o are parameter matrices.

步骤S3、在训练模型时采用交叉熵作为损失函数，用

表示预测结果。训练的过程是最小化所有句子真实极性y和预测

间的交叉熵损失值：Step S3, use cross entropy as the loss function when training the model, use

represents the prediction result. The training process is to minimize the true polarity y of all sentences and predict

The cross-entropy loss value between:

这里j表示其情感极性种类，在本文里有正面、负面和中性；i表示句子的索引号，λ是二阶范数正则化系数，θ是待解参数；同时设置dropout概率为0.5以防止过拟合。在本方法中采用200维的词向量来初始化句子中每个词，隐藏层维度同样也是100，其他参数矩阵初始化为均匀分布的抽样。训练模型时采用批量训练方式，每一批量包含20个句子。L₂正则化系数λ为0.001，优化算法采用 AdaGrad，其初始化系数为0.01。Here j represents the type of sentiment polarity, which is positive, negative and neutral in this paper; i represents the index number of the sentence, λ is the second-order norm regularization coefficient, and θ is the parameter to be solved; at the same time, set the dropout probability to 0.5 to Prevent overfitting. In this method, a 200-dimensional word vector is used to initialize each word in the sentence, the dimension of the hidden layer is also 100, and the other parameter matrices are initialized as uniformly distributed sampling. The batch training method is adopted when training the model, and each batch contains 20 sentences. The L ₂ regularization coefficient λ is 0.001, the optimization algorithm adopts AdaGrad, and its initialization coefficient is 0.01.

步骤S4、在实验中分别和传统的机器学习方法(包括支持向量机算法、 SVM-dep算法)以及深度学习的方法(AdaRNN-w/E、AdaRNNcomb、TC-LSTM) 分别进行了对比实验，评价模型时以准确率(Accuracy)来评价各种模型，结果如下表所示：Step S4, in the experiment, the experiments were compared with traditional machine learning methods (including support vector machine algorithm, SVM-dep algorithm) and deep learning methods (AdaRNN-w/E, AdaRNNcomb, TC-LSTM) respectively, and the evaluation Models are evaluated by accuracy (Accuracy), and the results are shown in the following table:

表1实验结果Table 1 Experimental results

通过实验数据可以看出通过用左右两个网络对句子建模的同时引入基于属性词的注意力机制的方法，在准确率上和其他模型相比有一定的优势。From the experimental data, it can be seen that the method of introducing an attention mechanism based on attribute words while modeling sentences with the left and right networks has certain advantages compared with other models in terms of accuracy.

Claims

1. An attribute level emotion analysis model based on a GRU network is characterized in that: the model comprises five parts, namely an input layer, an embedded layer, a GRU layer, an attention layer and an output layer; the input layer inputs short texts, namely sentences, into the model; the embedding layer maps each word in the sentence into a vector; the GRU layer acquires characteristic information by embedding words; the attention layer realizes an attention mechanism, and the attention mechanism fuses the word-level characteristic information into sentence-level characteristic information through weight calculation to generate a sentence characteristic vector; finally, classifying the sentence characteristic vectors;

1.1 input layer

Inputting each sentence needing emotion polarity classification at an input layer, and assuming that the sentence length is T, the sentence is expressed as s ═ x₁,x₂,...,x_T}，x_iRepresenting the ith word in the sentence;

1.2 embedding layer

Obtaining a sentence s containing T words from an input layer₁,x₂,...,x_TAnd fourthly, obtaining a corresponding word vector e of each word in the embedding layer_i

First embedding a matrix from words

A word vector for each word is obtained, where V is the length of the vocabulary, d^wIs a word vector dimension that can be specified, then there is

emb_i＝W^wrdvⁱ(1)

Wherein v isⁱIs a vector of length | V | where i is 1 and others are 0; likewise, the word vector emb of aspect is obtained_aspWhen the aspect in the sentence is a plurality of words, adding the values of the same dimensionality of the word vector of each word to obtain the word vector of the aspect; then will emb_iAnd emb_aspSpliced to obtain a final word vector e_i：

e_i＝[emb_i:emb_asp](2)

Finally, e is ═ e₁,e₂,...,e_TThe input is to the next layer;

1.3 GRU layer

In the GRU layer, the sentence is divided into left and right parts by taking the attribute as a demarcation point to model the context of the attribute, wherein { x_l+1,x_l+2,...,x_r-1Denotes aspect, { x₁,x₂,...,x_lDenotes the word before the attribute in the sentence, { x_r-1,x_r-2,...,x_TRepresents the words after the attribute; the left and right sequences are input into the left and right networks, and then the hidden layer respectively obtains { h₁,h₂,...,h_r-1H and_l+1,h_l+2,...,h_T}；

1.4 attention layer

An attention mechanism is introduced into the model to obtain a better classification effect, because different words and attributes of the front part and the rear part in the sentence are in different relations, more information which is closely related to the attributes is concerned; the attention mechanism is implemented as follows:

a_t＝softmax(w^TM) (4)

r＝Ha_t(5)

where a is_tIt is indicated that the attention weight coefficient,

denotes repetition e_aspMultiple times until the dimension of the model is consistent with that of H, H is a matrix formed by hidden layer outputs in the model, r represents a weighted vector representing the meaning of a sentence, W_h、W_vW is a parameter matrix, and then a vector o capable of finally representing sentence information is obtained

o＝tanh(W_pr+W_xh) (6)

h represents h_r-1And h_l+1The sum of the vectors;

1.5 output layer

Finally, the output o of the attention layer is input into the classifier

Implementing a polarity classification of the emotion, where W_oAnd b_oIs a ginseng to be trainedA matrix of numbers.

2. The attribute level emotion analysis method based on GRU is characterized by comprising the following steps: the method comprises the following specific steps:

step S1, firstly, inputting the collected twitter data set to an input layer of the Att-CGRU model;

step S2, inputting the data obtained in step S1 into an embedding layer to obtain a word vector of each word in the input sentence,

step S3, obtaining the word vector of each word in the sentence through the mode of S2 in the GRU layer, and then using the attribute words { x_l+1,x_l+2,...,x_r-1As the demarcation point, the left side { x }₁,x₂,...,x_lWord vector and the right side of the word { x }_r-1,x_r-2,...,x_TInputting the word vector of the word into two left and right GRU networks to model the context of the attribute word respectively, and obtaining output { h } from the hidden layer respectively₁,h₂,...,h_r-1H and_l+1,h_l+2,...,h_T}；

step S4, according to the output of S4, calculating a vector o capable of representing sentence information according to the following formula:

a_t＝softmax(w^TM)

r＝Ha_t

where r denotes a weighted vector, a, which characterizes the meaning of the sentence_tDenoted is an attention weight coefficient, which is formed by dividing w^TM is obtained after inputting the softmax function, M represents a vector obtained by a matrix H formed by the output of the hidden layer in the model GRU layer,

word vector e representing repetitive attributes_aspMultiple times until the dimension of the model is consistent with that of H, H is a matrix formed by hidden layer outputs in the model, tanh represents a tanh function, W_h、W_vW is a parameter matrix; finally, the vector o capable of finally pointing the sentence information is obtained

o＝tanh(W_pr+W_xh)

h represents h_r-1And h_l+1Sum of vectors, h_r-1Denotes the hidden layer output, h, corresponding to the r-1 th word in the left GRU network_l+1Denotes the hidden layer output, W, corresponding to the l +1 th word in the right GRU network_pAnd W_xRepresenting a parameter matrix;

in step S5, the output layer inputs the vector o capable of representing sentence information into the softmax function to obtain the predicted emotion polarity

In particular by

To obtain W_oAnd b_oAre all parameter matrices; step 6, calculating the loss function value loss according to the output of S5 and the actual classification y corresponding to each sentence

Wherein lambda is a regularization coefficient, and training iteration is carried out to Accuracy through an error back propagation algorithm to obtain a maximum value, and an optimization algorithm in the error back propagation algorithm is an AdaGrad algorithm with an initialization coefficient of 0.01.