CN112949476B - Text relation detection method, device and storage medium based on graph convolution neural network - Google Patents
Text relation detection method, device and storage medium based on graph convolution neural network Download PDFInfo
- Publication number
- CN112949476B CN112949476B CN202110224515.XA CN202110224515A CN112949476B CN 112949476 B CN112949476 B CN 112949476B CN 202110224515 A CN202110224515 A CN 202110224515A CN 112949476 B CN112949476 B CN 112949476B
- Authority
- CN
- China
- Prior art keywords
- key information
- edge
- text
- block
- information block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请涉及一种基于图卷积神经网络的文本关系检测方法、装置及存储介质,属于计算机技术领域,该方法包括:获取目标图像中文本信息的多个关键信息块,关键信息块中的每个文本块包括至少一个字符串;将每个关键信息块中每个文本块的字符串输入节点特征提取模型,得到关键信息块的节点特征;构建关键信息块中的每个文本块与其它关键信息块中各个文本块之间的连通关系;基于每个关键信息块对应的各个连通关系、以及每个连通关系对应的位置信息,确定关键信息块的边特征;将节点特征和边特征输入预先训练的图卷积神经网络,得到各个关键信息块之间的边类型;确定边类型相同的关键信息块具有关联关系;可以提高关联关系识别的准确性和效率。
This application relates to a text relationship detection method, device and storage medium based on graph convolutional neural network, which belongs to the field of computer technology. The method includes: acquiring multiple key information blocks of text information in the target image, and each key information block Each text block includes at least one string; input the string of each text block in each key information block into the node feature extraction model to obtain the node characteristics of the key information block; construct each text block in the key information block and other key information blocks The connectivity relationship between each text block in the information block; based on the connectivity relationships corresponding to each key information block and the location information corresponding to each connectivity relationship, determine the edge features of the key information block; input the node features and edge features in advance The trained graph convolutional neural network can obtain the edge types between each key information block; it is determined that key information blocks with the same edge type are associated; it can improve the accuracy and efficiency of identifying association relationships.
Description
【技术领域】【Technical field】
本申请涉及一种基于图卷积神经网络的文本关系检测方法、装置及存储介质,属于计算机技术领域。The present application relates to a text relationship detection method, device and storage medium based on graph convolutional neural network, and belongs to the field of computer technology.
【背景技术】【Background technique】
文本关系检测是自然语言处理领域中的一个常见需求。简单来说,文本关系检测是将一篇文档中包含的感兴趣的实体识别出来后按关系类型加以归类,例如:从票据、物流单等文件提取关键信息及信息间关联关系,可以将单证信息结构化,帮助提升从业人员的工作效率。Text relationship detection is a common requirement in the field of natural language processing. To put it simply, text relationship detection is to identify entities of interest contained in a document and then classify them according to relationship types. For example, extract key information and correlations between documents from documents such as bills and logistics orders. The certification information is structured to help improve the work efficiency of practitioners.
从票据、物流单等原始文件开始,先使用光学字符识别(Optical CharacterRecognition,OCR)识别文字字符和字符位置信息,依据距离阈值聚合字符获得文本块节点。接着,可以经过两个环节将单证信息结构化,环节一是使用单证信息提取模型以聚合的文本块节点为输入提取关键信息块,环节二是以关键信息块为输入检测关键信息文件间关系。Starting from original documents such as bills and logistics orders, optical character recognition (Optical Character Recognition, OCR) is first used to identify text characters and character position information, and the characters are aggregated based on distance thresholds to obtain text block nodes. Then, the document information can be structured through two steps. The first step is to use the document information extraction model to extract key information blocks with the aggregated text block nodes as input. The second step is to use the key information blocks as input to detect the key information between files. relation.
在提取关键信息块后,目前最常见的方法是基于关键信息块的位置信息编写逻辑规则,在横纵方向上以人工设定的距离阈值判断关键信息块间关联关系。另一种做法是将关键信息块区分键和值,键和值间存在是否匹配的关联关系,通过构建深度学习模型检测键值对的关联关系以获得关键信息块间关联关系。After extracting the key information blocks, the most common method currently is to write logical rules based on the location information of the key information blocks, and use manually set distance thresholds to determine the correlation between the key information blocks in the horizontal and vertical directions. Another approach is to distinguish key information blocks from keys and values, and there is a matching relationship between keys and values. By building a deep learning model to detect the relationship between key-value pairs, we can obtain the relationship between key information blocks.
然而,基于位置信息以逻辑规则判断关键信息块间关联关系的方法较为粗糙,其距离阈值的选择依赖人工经验且和样本强相关,不同文件的合适阈值存在差异,存在部分文件关系检测结果不合理的情况,如不符合正常页面排版格式的文件。However, the method of using logical rules to determine the correlation between key information blocks based on location information is relatively rough. The selection of the distance threshold relies on manual experience and is strongly correlated with the sample. There are differences in the appropriate thresholds for different files, and some file relationship detection results are unreasonable. situations, such as files that do not conform to the normal page layout format.
正常文件中的信息是按行出现,行与行间存在分割,处于同一行的文本可以认为存在一种最简单的关联关系,即行关联。但不符合正常页面排版的文件可能存在部分文本内容过长或位置偏移,导致其文本出现在其它行中,依据距离阈值会将文本错分到其它行,使关联检测错误。在单证文件中很多情况对页面排版的格式要求不够严格,存在一定比例的文件没有表格线且行列间没有明显的分隔,错行错列的情况较为常见。Information in normal files appears in lines, and there are divisions between lines. Text in the same line can be considered to have the simplest association relationship, that is, line association. However, files that do not conform to normal page layout may have part of the text content that is too long or shifted in position, causing the text to appear in other lines. The text will be misclassified into other lines based on the distance threshold, causing association detection errors. In many cases in document documents, the format requirements for page layout are not strict enough. A certain proportion of documents have no table lines and no obvious separation between rows and columns. Wrong rows and columns are common.
另一种先区分关键信息块的键和值,再检测键值对关系的方法,在目前已知的关系检测类型中是匹配和非匹配两类,实际业务使用场景中,文本间关系类型存在多种,只有是否匹配的关系检测不够灵活无法满足实际使用。由于此方法增加了键值对间关系的约束限制,无法支持一行或一列中全是值的情况下文本关系的检测。Another method is to first distinguish the key and value of the key information block, and then detect the key-value pair relationship. Among the currently known relationship detection types, there are two types: matching and non-matching. In actual business usage scenarios, there are relationship types between texts. There are many types, and only the matching relationship detection is not flexible enough to meet actual use. Since this method increases the constraints on the relationship between key-value pairs, it cannot support the detection of textual relationships when all values in a row or column are present.
【发明内容】[Content of the invention]
本申请提供了一种基于图卷积神经网络的文本关系检测方法、装置及存储介质,可以解决设置基于位置信息的逻辑规则来确定关联关系,可能导致检测结果不合理、以及检测方式不够灵活的问题。本申请提供如下技术方案:This application provides a text relationship detection method, device and storage medium based on graph convolutional neural network, which can solve the problem of setting logical rules based on location information to determine association relationships, which may lead to unreasonable detection results and inflexible detection methods. question. This application provides the following technical solutions:
第一方面,提供一种基于图卷积神经网络的文本关系检测方法,所述方法包括:In a first aspect, a text relationship detection method based on graph convolutional neural network is provided. The method includes:
获取目标图像中文本信息的多个关键信息块,所述关键信息块包括多个文本块,每个文本块包括至少一个字符串;Obtain multiple key information blocks of text information in the target image, where the key information blocks include multiple text blocks, and each text block includes at least one character string;
将每个关键信息块中每个文本块的字符串输入节点特征提取模型,得到所述关键信息块的节点特征;Enter the string string of each text block in each key information block into the node feature extraction model to obtain the node features of the key information block;
对于所述多个关键信息块中的每个关键信息块,构建所述关键信息块中的每个文本块与其它关键信息块中各个文本块之间的连通关系;For each key information block in the plurality of key information blocks, construct a connectivity relationship between each text block in the key information block and each text block in other key information blocks;
基于每个关键信息块对应的各个连通关系、以及每个连通关系对应的位置信息,确定所述关键信息块的边特征;Based on each connectivity relationship corresponding to each key information block and the location information corresponding to each connectivity relationship, determine the edge characteristics of the key information block;
将所述节点特征和所述边特征输入预先训练的图卷积神经网络,得到各个关键信息块之间的边类型;Input the node features and the edge features into a pre-trained graph convolutional neural network to obtain the edge types between each key information block;
确定边类型相同的关键信息块具有关联关系。It is determined that key information blocks with the same edge type are associated.
可选地,所述基于每个关键信息块对应的各个连通关系、以及每个连通关系对应的位置信息,确定所述关键信息块的边特征,包括:Optionally, determining the edge characteristics of the key information block based on each connectivity relationship corresponding to each key information block and the location information corresponding to each connectivity relationship includes:
对于每个连通关系,按照所述连通关系连通的两个文本块之间的相对位置,确定所述连通关系对应的子边特征;For each connected relationship, determine the sub-edge characteristics corresponding to the connected relationship according to the relative position between the two text blocks connected by the connected relationship;
对于每个关键信息块,确定所述关键信息块中每个文本块连通对应的子边特征;For each key information block, determine the sub-edge features connected to each text block in the key information block;
基于每个关键信息块对应的子边特征生成所述边特征。The edge features are generated based on sub-edge features corresponding to each key information block.
可选地,所述对于每个连通关系,按照所述连通关系连通的两个文本块之间的相对位置,确定所述连通关系对应的子边特征,包括:Optionally, for each connected relationship, determine the sub-edge characteristics corresponding to the connected relationship according to the relative position between the two text blocks connected by the connected relationship, including:
将所述相对位置按照方向和距离进行离散化,得到方向编码和距离编码;Discretize the relative position according to direction and distance to obtain direction encoding and distance encoding;
将所述方向编码和所述距离编码输入嵌入模型,得到方向嵌入编码、水平距离嵌入编码和垂直距离嵌入编码;Input the direction code and the distance code into the embedding model to obtain the direction embedding code, the horizontal distance embedding code and the vertical distance embedding code;
将所述方向嵌入编码、所述水平距离嵌入编码和所述垂直距离嵌入编码拼接后,投影得到固定长度的向量,得到所述子边特征。After splicing the direction embedding code, the horizontal distance embedding code and the vertical distance embedding code, a fixed-length vector is projected to obtain the sub-edge feature.
可选地,所述基于每个关键信息块对应的子边特征生成所述边特征,包括:Optionally, generating the edge features based on sub-edge features corresponding to each key information block includes:
将各个子边特征处理为相同的维度;Process each sub-edge feature into the same dimension;
将各个处理后的子边特征处理为第一固定维度,得到所述边特征。Each processed sub-edge feature is processed into a first fixed dimension to obtain the edge feature.
可选地,所述基于每个关键信息块对应的子边特征生成所述边特征,包括:Optionally, generating the edge features based on sub-edge features corresponding to each key information block includes:
对于每个关键信息块,获取由所述关键信息块的各个连通关系构成的连通关系匹配表;For each key information block, obtain a connectivity relationship matching table composed of each connectivity relationship of the key information block;
按照所述连通关系匹配表,从所述子边特征构成的向量表中查找对应的子边特征向量集合,得到所述关键信息块的边特征。According to the connectivity relationship matching table, the corresponding sub-edge feature vector set is searched from the vector table composed of the sub-edge features to obtain the edge features of the key information block.
可选地,所述将每个关键信息块中每个文本块的字符串输入节点特征提取模型,得到所述关键信息块的节点特征,包括:Optionally, inputting the string string of each text block in each key information block into the node feature extraction model to obtain the node features of the key information block includes:
对于每个文本块,将所述文本块中的字符串输入预先训练的循环神经网络RNN,得到每个字符串的特征向量;For each text block, input the string in the text block into the pre-trained recurrent neural network RNN to obtain the feature vector of each string;
将每个字符串的特征向量处理为第二固定维度,得到所述节点特征。The feature vector of each string is processed into a second fixed dimension to obtain the node features.
可选地,所述将所述节点特征和所述边特征输入预先训练的图卷积神经网络,得到各个关键信息块之间的边类型,包括:Optionally, inputting the node features and edge features into a pre-trained graph convolutional neural network to obtain edge types between each key information block includes:
对于每个关键信息块,将所述关键信息块的节点特征和与所述关键信息块具有连通关系的节点特征和边特征通过所述图卷积神经网络计算目标节点信息;For each key information block, calculate the target node information through the graph convolutional neural network using the node characteristics of the key information block and the node characteristics and edge characteristics that have a connectivity relationship with the key information block;
将边所关联的各个节点信息拼接,经过多层前向网络计算边的属性,得到所述边类型。The information of each node associated with the edge is spliced, and the edge attributes are calculated through a multi-layer forward network to obtain the edge type.
第二方面,提供一种基于图卷积神经网络的文本关系检测装置,所述装置包括:In a second aspect, a text relationship detection device based on a graph convolutional neural network is provided. The device includes:
关键信息获取模块,用于获取目标图像中文本信息的多个关键信息块,所述关键信息块包括多个文本块,每个文本块包括至少一个字符串;A key information acquisition module, used to obtain multiple key information blocks of text information in the target image, where the key information blocks include multiple text blocks, and each text block includes at least one character string;
节点特征提取模块,用于将每个关键信息块中每个文本块的字符串输入节点特征提取模型,得到所述关键信息块的节点特征;The node feature extraction module is used to input the string of each text block in each key information block into the node feature extraction model to obtain the node features of the key information block;
连通关系构建模块,用于对于所述多个关键信息块中的每个关键信息块,构建所述关键信息块中的每个文本块与其它关键信息块中各个文本块之间的连通关系;A connectivity relationship building module configured to, for each key information block among the plurality of key information blocks, construct a connectivity relationship between each text block in the key information block and each text block in other key information blocks;
边特征提取模块,用于基于每个关键信息块对应的各个连通关系、以及每个连通关系对应的位置信息,确定所述关键信息块的边特征;An edge feature extraction module, configured to determine the edge features of the key information block based on each connectivity relationship corresponding to each key information block and the location information corresponding to each connectivity relationship;
边类型计算模块,用于将所述节点特征和所述边特征输入预先训练的图卷积神经网络,得到各个关键信息块之间的边类型;An edge type calculation module, used to input the node characteristics and the edge characteristics into a pre-trained graph convolutional neural network to obtain the edge type between each key information block;
关联关系确定模块,用于确定边类型相同的关键信息块具有关联关系。The association relationship determination module is used to determine that key information blocks with the same edge type have an association relationship.
第三方面,提供一种基于图卷积神经网络的文本关系检测装置,所述装置包括处理器和存储器;所述存储器中存储有程序,所述程序由所述处理器加载并执行以实现第一方面所述的基于图卷积神经网络的文本关系检测方法。In a third aspect, a text relationship detection device based on a graph convolutional neural network is provided. The device includes a processor and a memory; a program is stored in the memory, and the program is loaded and executed by the processor to implement the first The text relationship detection method based on graph convolutional neural network described in one aspect.
第四方面,提供一种计算机可读存储介质,所述存储介质中存储有程序,所述程序由所述处理器加载并执行以实现第一方面所述的基于图卷积神经网络的文本关系检测方法。In a fourth aspect, a computer-readable storage medium is provided. A program is stored in the storage medium. The program is loaded and executed by the processor to implement the text relationship based on the graph convolutional neural network described in the first aspect. Detection method.
本申请的有益效果在于:通过获取目标图像中文本信息的多个关键信息块,关键信息块包括多个文本块,每个文本块包括至少一个字符串;将每个关键信息块中每个文本块的字符串输入节点特征提取模型,得到关键信息块的节点特征;对于多个关键信息块中的每个关键信息块,构建关键信息块中的每个文本块与其它关键信息块中各个文本块之间的连通关系;基于每个关键信息块对应的各个连通关系、以及每个连通关系对应的位置信息,确定关键信息块的边特征;将节点特征和边特征输入预先训练的图卷积神经网络,得到各个关键信息块之间的边类型;确定边类型相同的关键信息块具有关联关系;可以解决设置基于位置信息的逻辑规则来确定关联关系,可能导致检测结果不合理的问题,提高关联关系识别的准确性。同时,本申请提供的关系检测方法不需要区分关键信息块的键和值,直接进行关联关系的检测,可以提高关联关系检测的效率。The beneficial effects of this application are: by obtaining multiple key information blocks of text information in the target image, the key information blocks include multiple text blocks, each text block includes at least one character string; each text in each key information block is The string of the block is input into the node feature extraction model to obtain the node features of the key information block; for each key information block in multiple key information blocks, each text block in the key information block and each text in other key information blocks are constructed. Connectivity relationships between blocks; determine the edge features of key information blocks based on the connectivity relationships corresponding to each key information block and the position information corresponding to each connectivity relationship; input the node features and edge features into the pre-trained graph convolution Neural network can obtain the edge types between each key information block; determine that key information blocks with the same edge type are associated; it can solve the problem of setting logical rules based on location information to determine the association, which may lead to unreasonable detection results, and improve Accuracy of association relationship identification. At the same time, the relationship detection method provided by this application does not need to distinguish the key and value of the key information block, and directly detects the association relationship, which can improve the efficiency of association relationship detection.
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,并可依照说明书的内容予以实施,以下以本申请的较佳实施例并配合附图详细说明如后。The above description is only an overview of the technical solutions of the present application. In order to have a clearer understanding of the technical means of the present application and implement them according to the contents of the specification, the preferred embodiments of the present application are described in detail below with reference to the accompanying drawings.
【附图说明】[Picture description]
图1是本申请一个实施例提供的基于图卷积神经网络的文本关系检测方法的流程图;Figure 1 is a flow chart of a text relationship detection method based on a graph convolutional neural network provided by an embodiment of the present application;
图2是本申请另一个实施例提供的基于图卷积神经网络的文本关系检测方法的流程图;Figure 2 is a flow chart of a text relationship detection method based on a graph convolutional neural network provided by another embodiment of the present application;
图3是本申请一个实施例提供的基于图卷积神经网络的文本关系检测装置的框图;Figure 3 is a block diagram of a text relationship detection device based on a graph convolutional neural network provided by an embodiment of the present application;
图4是本申请一个实施例提供的基于图卷积神经网络的文本关系检测装置的框图。Figure 4 is a block diagram of a text relationship detection device based on a graph convolutional neural network provided by an embodiment of the present application.
【具体实施方式】【Detailed ways】
下面结合附图和实施例,对本申请的具体实施方式作进一步详细描述。以下实施例用于说明本申请,但不用来限制本申请的范围。Specific implementations of the present application will be described in further detail below with reference to the accompanying drawings and examples. The following examples are used to illustrate the present application but are not intended to limit the scope of the present application.
首先,对本申请涉及的若干名词进行介绍。First, some terms involved in this application will be introduced.
光学字符识别(Optical Character Recognition,OCR):是一种将图像中的信息转化成文字的识别技术。Optical Character Recognition (OCR): It is a recognition technology that converts information in images into text.
文本块节点:以一定阈值切分的文本块,包含文本内容、文本位置及相关图片背景。Text block node: a text block segmented by a certain threshold, including text content, text position and related image background.
关键信息块:由一组存在类型的文本块节点组成,为单证文件中有价值的信息,例如价格、重量等。Key information block: It consists of a set of text block nodes of existing type, which are valuable information in the document file, such as price, weight, etc.
节点特征:指关键信息块的特征信息,从关键信息块的文本内容和类型中编码获取。Node characteristics: refers to the characteristic information of key information blocks, which is encoded and obtained from the text content and type of key information blocks.
边特征:指关键信息块间连通的边所具有的特征信息。Edge characteristics: refers to the characteristic information of the connected edges between key information blocks.
子边特征:指文本块节点间连通的边所具有的特征信息,从节点的位置信息中编码获取边连接的特征信息。Sub-edge features: refer to the characteristic information of the edges connected between text block nodes. The characteristic information of edge connections is obtained by encoding from the position information of the nodes.
信息提取模型:以文本块节点为输入提取文本文件中关键信息块的模型,图卷积网络是其核心组件。Information extraction model: A model that uses text block nodes as input to extract key information blocks in text files. The graph convolution network is its core component.
循环神经网络(Recurrent Neural Network,RNN):是一种特殊的神经网络结构,由输入层、隐藏层和输出层组成。Recurrent Neural Network (RNN): It is a special neural network structure consisting of an input layer, a hidden layer and an output layer.
Summary模型:一种人工设计的神经网络结构,以一组特征向量作为输入,输出一个固定维度的向量以表征这组特征向量的语义信息。Summary model: An artificially designed neural network structure that takes a set of feature vectors as input and outputs a fixed-dimensional vector to represent the semantic information of this set of feature vectors.
图卷积神经网络(Graph Convolutional Network,GCN):是指神经网络在图上应用的一种采用图卷积的神经网络,可以应用于图嵌入(Graph Embedding/NetworkEmbedding,GE)。Graph Convolutional Network (GCN): refers to a neural network that uses graph convolution and can be applied to graph embedding (Graph Embedding/NetworkEmbedding, GE).
图G=(V,E),V为节点的集合,E为边的集合,对于每个节点i,均有其特征xi,可以用矩阵XN*D表示。其中,N表示节点数,D表示每个节点的特征数,或者说是特征向量的维度。Graph G = (V, E), V is a set of nodes, and E is a set of edges. For each node i, it has its characteristics x i , which can be represented by the matrix X N*D . Among them, N represents the number of nodes, and D represents the number of features of each node, or the dimension of the feature vector.
图卷积是指通过当前节点的周围节点确定当前节点的特征表示的过程。其中,周围节点可以为当前节点的邻居节点,即与当前节点具有关联关系的节点(或称节点),或者为当前节点的邻居节点的邻居节点等,本申请不对周围节点的类型作限定。Graph convolution refers to the process of determining the feature representation of the current node through the surrounding nodes of the current node. The surrounding nodes may be neighbor nodes of the current node, that is, nodes (or nodes) that are associated with the current node, or neighbor nodes of the neighbor nodes of the current node, etc. This application does not limit the types of surrounding nodes.
图卷积可以通过下述非线性函数表示:Graph convolution can be represented by the following nonlinear function:
Hl+1=f(Hl,A)H l+1 =f(H l ,A)
其中,H0=X为第一层的输入,X∈RN*D,N为图的节点个数,D为每个节点特征向量的维度,A为邻接矩阵,不同图卷积神经网络的函数f相同或不同。Among them, H 0 =X is the input of the first layer , The functions f are the same or different.
可选地,本申请以各个实施例的执行主体为具有计算能力的电子设备为例进行说明,该电子设备可以为终端或服务器,该终端可以为计算机、笔记本电脑、平板电脑等,本实施例不对终端的类型和电子设备的类型作限定。Optionally, this application takes as an example that the execution subject of each embodiment is an electronic device with computing capabilities. The electronic device can be a terminal or a server. The terminal can be a computer, a notebook computer, a tablet computer, etc. In this embodiment The type of terminal and the type of electronic equipment are not limited.
本实施例提供的文本关系检测方法适用于对文本文件中的关键信息块之间的关联关系进行识别。每种关键信息块对应一种命名实体,比如账单文件包括四种关键信息块,分别为:产品原产国、产品数量、产品单价和产品总价,每种关键信息块的数量为多个,本实施例可以检测到具有关联关系(如:属于同一产品)的关键信息块。The text relationship detection method provided in this embodiment is suitable for identifying associations between key information blocks in text files. Each key information block corresponds to a named entity. For example, the bill file includes four key information blocks, namely: product country of origin, product quantity, product unit price and product total price. The number of each key information block is multiple. This embodiment can detect key information blocks that have an associated relationship (eg, belong to the same product).
在实际应用中,关键信息块可以是增值税发票、保单等各种单据中的关键信息块,也可以为其它类型的文件中的关键信息块,如证件图像中的关键信息块等,本实施例不对关键信息块的类型作限定。In practical applications, key information blocks can be key information blocks in various documents such as value-added tax invoices and insurance policies, or key information blocks in other types of documents, such as key information blocks in document images. This implementation The example does not limit the type of key information block.
基于此,如果关键信息块包含多种,且每种关键信息块的数量为多个,如何准确识别出具有关联关系的关键信息块,成为亟待解决的问题。基于此,本申请提供的文本关系检测方案以文本文件(如单证文件、证件文件等)中的关键信息块作为输入,关键信息块由一组存在类型的文本块节点组成;以关键信息块作为节点,关键信息块间的连通情况作为边构建无向图,以此图使用图神经网络编码学习特征并预测边连接关系的类型;其中,输入图神经网络的节点和边需要从关键信息块提取相应的特征信息,节点特征从关键信息块自身提取,边特征从关键信息块间的边连通关系中提取;接着将节点特征和边特征输入构造好的图神经网络架构中编码学习边关系特征向量;最后根据边关系特征向量做边类型预测并聚合得到关键信息块间的关联关系。Based on this, if the key information blocks contain multiple types of key information blocks, and the number of each type of key information blocks is multiple, how to accurately identify the key information blocks with associated relationships has become an urgent problem to be solved. Based on this, the text relationship detection scheme provided by this application takes key information blocks in text files (such as document files, certificate files, etc.) as input. The key information blocks are composed of a set of text block nodes of existing types; with key information blocks As nodes, the connectivity between key information blocks is used as edges to construct an undirected graph. Using this graph, graph neural network coding is used to learn features and predict the type of edge connection relationships; among them, the nodes and edges input to the graph neural network need to be drawn from key information blocks. Extract corresponding feature information, node features are extracted from the key information blocks themselves, and edge features are extracted from the edge connectivity relationships between key information blocks; then the node features and edge features are input into the constructed graph neural network architecture to encode and learn the edge relationship features vector; finally, the edge type is predicted based on the edge relationship feature vector and aggregated to obtain the association between key information blocks.
由于可以根据关键信息块的特征和边特征自动检测关联关系,因此,可以解决设置基于位置信息的逻辑规则来确定关联关系,可能导致检测结果不合理的问题,提高关联关系识别的准确性。Since the association relationship can be automatically detected based on the characteristics and edge features of key information blocks, it can solve the problem of setting logical rules based on location information to determine the association relationship, which may lead to unreasonable detection results, and improve the accuracy of association relationship identification.
另外,本申请提供的关系检测方法不需要区分关键信息块的键和值,直接进行关联关系的检测,可以提高关联关系检测的效率。In addition, the relationship detection method provided by this application does not need to distinguish the key and value of the key information block, and directly detects the association relationship, which can improve the efficiency of association relationship detection.
下面对本申请提供的基于图卷积神经网络的文本关系检测方法进行详细介绍。The text relationship detection method based on graph convolutional neural network provided by this application is introduced in detail below.
图1是本申请一个实施例提供的基于图卷积神经网络的文本关系检测方法的流程图。该方法至少包括以下几个步骤:Figure 1 is a flow chart of a text relationship detection method based on a graph convolutional neural network provided by an embodiment of the present application. This method includes at least the following steps:
步骤101,获取目标图像中文本信息的多个关键信息块,关键信息块包括多个文本块,每个文本块包括至少一个字符串。Step 101: Obtain multiple key information blocks of text information in the target image. The key information blocks include multiple text blocks, and each text block includes at least one character string.
关键信息块为目标图像的文本信息中需要提取出的信息,或者说是用户关注的信息。The key information block is the information that needs to be extracted from the text information of the target image, or the information that the user pays attention to.
可选地,关键信息块可以是电子设备识别到的,如使用电子设备内部存储的信息提取模型,以文本块节点为输入提取文本文件中关键信息块的模型;或者是其它设备发送的,本实施例不对关键信息块的获取方式作限定。Optionally, the key information block can be recognized by the electronic device, such as using an information extraction model stored inside the electronic device to extract the key information block in the text file using text block nodes as input; or it can be sent by other devices. The embodiment does not limit the acquisition method of the key information block.
本实施例中,关键信息块由一组存在类型的文本块组成;文本块包含两种信息,其一是文本块的文字内容,其二是文本块的位置信息。In this embodiment, the key information block is composed of a group of text blocks of existing types; the text block contains two kinds of information, one is the text content of the text block, and the other is the position information of the text block.
可选地,位置信息为由文本块的左上角和右下角构成的文本框标识。或者,位置信息也可以为文本块对应的各个像素位置,本实施例不对位置信息的实现方式作限定。Optionally, the position information is a text box identifier composed of the upper left corner and the lower right corner of the text block. Alternatively, the position information may also be each pixel position corresponding to the text block. This embodiment does not limit the implementation method of the position information.
可选地,电子设备还可以对关键信息块做数据格式转换,以使转换后的数据格式适用于后续步骤。数据格式转换包括但不限于:关键信息块与文本块节点间的关联索引、文本块节点位置坐标的格式化等,本实施例不对数据格式转换的具体内容作限定。Optionally, the electronic device can also perform data format conversion on the key information blocks so that the converted data format is suitable for subsequent steps. Data format conversion includes but is not limited to: association indexes between key information blocks and text block nodes, formatting of text block node position coordinates, etc. This embodiment does not limit the specific content of data format conversion.
步骤102,将每个关键信息块中每个文本块的字符串输入节点特征提取模型,得到关键信息块的节点特征。Step 102: Input the string string of each text block in each key information block into the node feature extraction model to obtain the node features of the key information block.
本实施例中,关键信息块的节点特征使用其类型信息和所对应的一组文本块提取获得。In this embodiment, the node characteristics of the key information block are extracted using its type information and a set of corresponding text blocks.
在一个示例中,将每个关键信息块中每个文本块的字符串输入节点特征提取模型,得到关键信息块的节点特征,包括:对于每个文本块,将文本块中的字符串输入预先训练的RNN,得到每个字符串的特征向量;将每个字符串的特征向量处理为第二固定维度,得到节点特征。In one example, the string of each text block in each key information block is input into the node feature extraction model to obtain the node features of the key information block, including: for each text block, input the string in the text block in advance The trained RNN obtains the feature vector of each string; the feature vector of each string is processed into the second fixed dimension to obtain the node features.
具体地,对于输入节点特征提取模型的一组文本块对应的字符串,先将字符串按字符得到字符的向量化表示,再经过RNN编码字符向量,接着使用Summary模型将字符向量加权叠加成固定维度的特征向量,以此特征向量表征文本块节点的文本特征。Specifically, for the string corresponding to a set of text blocks of the input node feature extraction model, the string is first divided into characters to obtain a vectorized representation of the character, and then the character vector is encoded by RNN, and then the Summary model is used to weight and superpose the character vector into a fixed A dimensional feature vector that represents the text features of text block nodes.
上述示例以节点特征提取模型包括RNN和Summary模型为例进行说明,在实际实现时,节点特征提取模型也可以为其它类型的神经网络,比如:通过线性回归模型计算得到字符串的特征向量、或者使用word2vector计算特征向量,本实施例不对节点特征提取模型的实现方式作限定。The above example takes the node feature extraction model including RNN and Summary model as an example. In actual implementation, the node feature extraction model can also be other types of neural networks, such as: calculating the feature vector of a string through a linear regression model, or Use word2vector to calculate feature vectors. This embodiment does not limit the implementation of the node feature extraction model.
可选地,电子设备预先随机初始化一个节点类型的向量表,通过关键信息块的节点类型对应的索引下标在向量表中查找向量,以此向量表征关键信息块的类型特征。之后,将类型特征扩充成文本特征相同的维度并与其拼接成一组新的特征向量,将特征向量使用RNN和Summary模型编码提取获得一个固定维度的特征向量,以此提取的特征向量表征关键信息块的节点特征。Optionally, the electronic device randomly initializes a vector table of node types in advance, searches for a vector in the vector table through the index subscript corresponding to the node type of the key information block, and uses this vector to characterize the type characteristics of the key information block. After that, the type features are expanded into the same dimensions as the text features and spliced into a new set of feature vectors. The feature vectors are extracted using RNN and Summary model coding to obtain a fixed-dimensional feature vector. The extracted feature vectors represent key information blocks. node characteristics.
步骤103,对于多个关键信息块中的每个关键信息块,构建关键信息块中的每个文本块与其它关键信息块中各个文本块之间的连通关系。Step 103: For each key information block among the multiple key information blocks, construct a connectivity relationship between each text block in the key information block and each text block in other key information blocks.
关键信息块由一组文本块组成,存在多个文本位置信息。一种策略是构造一个足够大的位置区域以涵盖所有文本块节点,但这个位置区域的描述不够精准且可能与其它关键信息节点的位置区域产生重叠。基于此,本实施例中,将两个关键信息块间的连通关系定义为两组文本块节点间两两连通的关系总和。即第一个关键信息块包含M个文本块,第二个关键信息块包含N个文本块,则两个关键信息块间存在M*N个文本块连通关系,这些连通关系的和即表征两个关键信息节点间的连通关系,其中M和N是大于等于1的正整数。The key information block consists of a group of text blocks, and there is multiple text position information. One strategy is to construct a location area large enough to cover all text block nodes, but the description of this location area is not precise enough and may overlap with the location areas of other key information nodes. Based on this, in this embodiment, the connectivity relationship between two key information blocks is defined as the sum of the pairwise connectivity relationships between the two groups of text block nodes. That is, the first key information block contains M text blocks, and the second key information block contains N text blocks, then there are M*N text block connectivity relationships between the two key information blocks. The sum of these connectivity relationships represents the two The connectivity relationship between key information nodes, where M and N are positive integers greater than or equal to 1.
步骤104,基于每个关键信息块对应的各个连通关系、以及每个连通关系对应的位置信息,确定关键信息块的边特征。Step 104: Determine the edge characteristics of the key information block based on each connectivity relationship corresponding to each key information block and the location information corresponding to each connectivity relationship.
可选地,步骤103和104可以在步骤102之后执行,或者也可以在步骤102之前执行,或者还可以与步骤102同时执行,本实施例不对步骤103和104与步骤102之间的执行顺序作限定。Optionally, steps 103 and 104 may be executed after step 102, or may be executed before step 102, or may be executed simultaneously with step 102. This embodiment does not change the execution order between steps 103 and 104 and step 102. limited.
在一个示例中,基于每个关键信息块对应的各个连通关系、以及每个连通关系对应的位置信息,确定关键信息块的边特征,包括:对于每个连通关系,按照连通关系连通的两个文本块之间的相对位置,确定连通关系对应的子边特征;对于每个关键信息块,确定关键信息块中每个文本块连通对应的子边特征;基于每个关键信息块对应的子边特征生成边特征。In one example, based on each connected relationship corresponding to each key information block and the location information corresponding to each connected relationship, the edge characteristics of the key information block are determined, including: for each connected relationship, two connected according to the connected relationship The relative positions between text blocks determine the sub-edge characteristics corresponding to the connectivity relationship; for each key information block, determine the sub-edge characteristics corresponding to the connectivity of each text block in the key information block; based on the sub-edge characteristics corresponding to each key information block Features generate edge features.
其中,对于每个连通关系,按照连通关系连通的两个文本块之间的相对位置,确定连通关系对应的子边特征,包括:将相对位置按照方向和距离进行离散化,得到方向编码和距离编码;将方向编码和距离编码输入嵌入模型,得到方向嵌入编码、水平距离嵌入编码和垂直距离嵌入编码;将方向嵌入编码、水平距离嵌入编码和垂直距离嵌入编码拼接后,投影得到固定长度的向量,得到子边特征。Among them, for each connected relationship, determine the sub-edge characteristics corresponding to the connected relationship according to the relative position between the two text blocks connected by the connected relationship, including: discretizing the relative position according to the direction and distance to obtain the direction code and distance Coding; input the direction code and distance code into the embedding model to obtain the direction embedding code, horizontal distance embedding code and vertical distance embedding code; after splicing the direction embedding code, horizontal distance embedding code and vertical distance embedding code, project to obtain a fixed length vector , get the sub-edge features.
嵌入模型可以为预先训练的embedding层。The embedding model can be a pre-trained embedding layer.
比如:对于具有连通关系的两个文本块的相对位置(中心连线矢量),按方向和距离进行离散化,离散化指的是:方向上按角度分为多个方向(如360个方向,相邻两个方向相差1°),距离上以目标图像的长宽除垂直和水平距离,得到归一化的垂直距离和水平距离,再乘以1000并取整。这样,就得到方向的整数编码和距离的整数编码。将方向、水平、垂直的整数编码通过嵌入模型计算对应的嵌入编码,得到方向、水平、垂直三种嵌入编码。将嵌入编码拼接并投影到固定长度的向量,作为有向图的边特征。For example: for the relative position (center connection vector) of two connected text blocks, discretize according to direction and distance. Discretization refers to: the direction is divided into multiple directions according to angle (such as 360 directions, The difference between two adjacent directions is 1°). Divide the vertical and horizontal distance by the length and width of the target image to obtain the normalized vertical distance and horizontal distance, then multiply by 1000 and round. In this way, an integer encoding of direction and an integer encoding of distance are obtained. The integer codes of direction, horizontal and vertical are calculated through the embedding model to calculate the corresponding embedding codes, and three embedding codes of direction, horizontal and vertical are obtained. The embedding codes are concatenated and projected into fixed-length vectors as edge features of the directed graph.
可选地,边特征提取过程中两个步骤,先提取文本块间的子边特征,再将关键信息块间的边特征从子边特征的向量表中查找并编码提取获得。Optionally, there are two steps in the edge feature extraction process: first, extract sub-edge features between text blocks, and then search and encode the edge features between key information blocks from the vector table of sub-edge features.
其中,从子边特征的向量表中查找并编码提取获得边特征,包括:对于每个关键信息块,获取由关键信息块的各个连通关系构成的连通关系匹配表;按照连通关系匹配表,从子边特征构成的向量表中查找对应的子边特征向量集合,得到关键信息块的边特征。Among them, the edge features are obtained by searching and encoding the vector table of the sub-edge features, including: for each key information block, obtaining a connectivity relationship matching table composed of each connectivity relationship of the key information block; according to the connectivity relationship matching table, from Search the corresponding set of sub-edge feature vectors in the vector table composed of sub-edge features to obtain the edge features of the key information block.
可选地,在获得到子边特征向量集合后,电子设备将各个子边特征处理为相同的维度;将各个处理后的子边特征处理为第一固定维度,得到边特征。Optionally, after obtaining the set of sub-edge feature vectors, the electronic device processes each sub-edge feature into the same dimension; processes each processed sub-edge feature into a first fixed dimension to obtain the edge feature.
下面对确定关键信息块的边特征举一个实例进行说明:The following is an example of determining the edge characteristics of key information blocks:
1)计算文本块间的子边特征向量,从文本块间的子边连通构建一个由子边特征向量构成的向量表。1) Calculate the sub-edge feature vectors between text blocks, and construct a vector table composed of sub-edge feature vectors from the sub-edge connections between text blocks.
2)将关键信息块间的子边连通情况匹配表在子边特征向量表中查找以获取向量化表示;2) Search the sub-edge connectivity matching table between key information blocks in the sub-edge feature vector table to obtain vectorized representation;
3)将向量化后的关键信息块间子边特征集合扩充成相同的维度,再经过Summary模型将其处理成第一固定维度的特征向量,以此向量表征该关键信息块的边特征。3) Expand the vectorized sub-edge feature set between key information blocks into the same dimension, and then process it into a first fixed-dimensional feature vector through the Summary model, and use this vector to represent the edge features of the key information block.
步骤105,将节点特征和边特征输入预先训练的图卷积神经网络,得到各个关键信息块之间的边类型。Step 105: Input node features and edge features into the pre-trained graph convolutional neural network to obtain edge types between each key information block.
图卷积神经网络的输入为节点特征和边特征,计算边类型的过程包括:对于每个关键信息块,将关键信息块的节点特征和与关键信息块具有连通关系的节点特征和边特征通过图卷积神经网络计算目标节点信息;将边所关联的各个节点信息拼接,经过多层前向网络计算边的属性,得到边类型。The input of the graph convolutional neural network is node features and edge features. The process of calculating edge types includes: for each key information block, the node features of the key information block and the node features and edge features that have a connectivity relationship with the key information block are passed through The graph convolutional neural network calculates the target node information; the information of each node associated with the edge is spliced, and the edge attributes are calculated through a multi-layer forward network to obtain the edge type.
其中,计算目标节点信息和计算边的属性的过程会重复执行多次,以扩大视野并获得高层语义特征。Among them, the process of calculating target node information and calculating edge attributes is repeated multiple times to expand the field of view and obtain high-level semantic features.
步骤106,确定边类型相同的关键信息块具有关联关系。Step 106: Determine that key information blocks with the same edge type have an associated relationship.
将关键信息块按相同边类型的连接聚合在一起作为一组,由组表示其中的关键信息块间存在某种关联关系,此处的关联关系即检测的边类型。将关键信息块和边类型预测结果聚合成一组带类型的节点关系集合,这种关系集合是一种结构化的信息提取结果,即为所需检测的文本间关系。Key information blocks are aggregated into a group by connections of the same edge type, and the group indicates that there is a certain correlation between the key information blocks. The correlation here is the detected edge type. The key information blocks and edge type prediction results are aggregated into a set of typed node relationships. This relationship set is a structured information extraction result, that is, the inter-text relationship that needs to be detected.
为了更清楚地理解本申请提供的基于图卷积神经网络的文本关系检测方法,参考图2,下面以一个实例对该文本关系检测方法进行举例说明。图2中以单证文件中的关键信息块作为输入为例进行说明。对关键信息块中的文字内容提取文本特征,对关键信息块的类型编码其类型特征;将关键信息块的文本特征和类型特征组合编码获取节点特征;将所有的关键信息块拆散得到候选的文本块集合,根据文本块的位置信息提取文本块两两间的子边特征并用其构建一个子边特征向量表;将关键信息块两两间的边特征看作由其两组对应的文本块间子边特征的聚合,即每个边特征由一组子边特征构成,通过这组子边所对应的下标索引可以在子边特征向量表中查找获取,在取得边特征对应的一组子边特征向量后,经编码提取获得关键信息块间边特征;将关键信息块特征和关键信息块间边特征经过构造的图神经网络层编码学习边关系的特征向量;使用边关系的特征向量检测关键信息块间边连通的类型,并依据边类型聚合得到一组组带类型的关键信息块集合。In order to more clearly understand the text relationship detection method based on the graph convolutional neural network provided by this application, refer to Figure 2. The text relationship detection method is illustrated below with an example. Figure 2 illustrates this by taking key information blocks in the document file as input as an example. Extract text features from the text content in the key information blocks, and encode the type features of the key information blocks; combine and encode the text features and type features of the key information blocks to obtain node features; dismantle all key information blocks to obtain candidate texts Block set, extract the sub-edge features between pairs of text blocks based on the position information of the text blocks and use them to construct a sub-edge feature vector table; regard the edge features between pairs of key information blocks as the two sets of corresponding text blocks. The aggregation of sub-edge features, that is, each edge feature is composed of a set of sub-edge features. The subscript index corresponding to this set of sub-edges can be searched and obtained in the sub-edge feature vector table. After obtaining a set of sub-edge features corresponding to After edge feature vectors are extracted, the edge features between key information blocks are obtained through coding and extraction; the key information block features and the edge features between key information blocks are encoded through the constructed graph neural network layer to learn the feature vectors of edge relationships; the feature vector detection of edge relationships is used Types of edge connections between key information blocks, and aggregation based on edge types to obtain a set of typed key information block sets.
可选地,步骤101-105可以实现在同一网络模型中,此时,该网络模型的输入为关键信息块,输出为边类型。Optionally, steps 101-105 can be implemented in the same network model. At this time, the input of the network model is the key information block and the output is the edge type.
综上所述,本实施例提供的基于图卷积神经网络的文本关系检测方法,通过获取目标图像中文本信息的多个关键信息块,关键信息块包括多个文本块,每个文本块包括至少一个字符串;将每个关键信息块中每个文本块的字符串输入节点特征提取模型,得到关键信息块的节点特征;对于多个关键信息块中的每个关键信息块,构建关键信息块中的每个文本块与其它关键信息块中各个文本块之间的连通关系;基于每个关键信息块对应的各个连通关系、以及每个连通关系对应的位置信息,确定关键信息块的边特征;将节点特征和边特征输入预先训练的图卷积神经网络,得到各个关键信息块之间的边类型;确定边类型相同的关键信息块具有关联关系;可以解决设置基于位置信息的逻辑规则来确定关联关系,可能导致检测结果不合理的问题,提高关联关系识别的准确性。同时,本申请提供的关系检测方法不需要区分关键信息块的键和值,直接进行关联关系的检测,可以提高关联关系检测的效率。To sum up, the text relationship detection method based on graph convolutional neural network provided by this embodiment obtains multiple key information blocks of text information in the target image. The key information blocks include multiple text blocks, and each text block includes At least one string; input the string of each text block in each key information block into the node feature extraction model to obtain the node characteristics of the key information block; for each key information block in multiple key information blocks, construct the key information The connectivity relationship between each text block in the block and each text block in other key information blocks; based on each connectivity relationship corresponding to each key information block and the position information corresponding to each connectivity relationship, determine the edges of the key information block Features; input node features and edge features into the pre-trained graph convolutional neural network to obtain the edge types between each key information block; determine that key information blocks with the same edge type are associated; can solve the problem of setting logical rules based on location information To determine the association relationship, it may lead to unreasonable detection results and improve the accuracy of association relationship identification. At the same time, the relationship detection method provided by this application does not need to distinguish the key and value of the key information block, and directly detects the association relationship, which can improve the efficiency of association relationship detection.
图3是本申请一个实施例提供的基于图卷积神经网络的文本关系检测装置的框图。该装置至少包括以下几个模块:关键信息获取模块310、节点特征提取模块320、连通关系构建模块330、边特征提取模块340、边类型计算模块350和关联关系确定模块360。Figure 3 is a block diagram of a text relationship detection device based on a graph convolutional neural network provided by an embodiment of the present application. The device at least includes the following modules: key information acquisition module 310, node feature extraction module 320, connectivity relationship building module 330, edge feature extraction module 340, edge type calculation module 350 and association relationship determination module 360.
关键信息获取模块310,用于获取目标图像中文本信息的多个关键信息块,所述关键信息块包括多个文本块,每个文本块包括至少一个字符串;The key information acquisition module 310 is used to obtain multiple key information blocks of text information in the target image, where the key information blocks include multiple text blocks, and each text block includes at least one character string;
节点特征提取模块320,用于将每个关键信息块中每个文本块的字符串输入节点特征提取模型,得到所述关键信息块的节点特征;The node feature extraction module 320 is used to input the string string of each text block in each key information block into the node feature extraction model to obtain the node features of the key information block;
连通关系构建模块330,用于对于所述多个关键信息块中的每个关键信息块,构建所述关键信息块中的每个文本块与其它关键信息块中各个文本块之间的连通关系;The connectivity relationship building module 330 is configured to, for each key information block in the plurality of key information blocks, build a connectivity relationship between each text block in the key information block and each text block in other key information blocks. ;
边特征提取模块340,用于基于每个关键信息块对应的各个连通关系、以及每个连通关系对应的位置信息,确定所述关键信息块的边特征;The edge feature extraction module 340 is used to determine the edge features of the key information block based on each connectivity relationship corresponding to each key information block and the location information corresponding to each connectivity relationship;
边类型计算模块350,用于将所述节点特征和所述边特征输入预先训练的图卷积神经网络,得到各个关键信息块之间的边类型;The edge type calculation module 350 is used to input the node features and the edge features into a pre-trained graph convolutional neural network to obtain the edge types between each key information block;
关联关系确定模块360,用于确定边类型相同的关键信息块具有关联关系。The association relationship determination module 360 is used to determine that key information blocks with the same edge type have an association relationship.
相关细节参考上述方法实施例。For relevant details, refer to the above method embodiments.
需要说明的是:上述实施例中提供的基于图卷积神经网络的文本关系检测装置在进行基于图卷积神经网络的文本关系检测时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将基于图卷积神经网络的文本关系检测装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的基于图卷积神经网络的文本关系检测装置与基于图卷积神经网络的文本关系检测方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that when the text relationship detection device based on the graph convolutional neural network provided in the above embodiment performs the text relationship detection based on the graph convolutional neural network, only the division of the above functional modules is used as an example, and the actual application is , the above function allocation can be completed by different functional modules as needed, that is, the internal structure of the text relationship detection device based on the graph convolutional neural network is divided into different functional modules to complete all or part of the functions described above. In addition, the text relationship detection device based on graph convolutional neural network provided by the above embodiments and the text relationship detection method embodiment based on graph convolutional neural network belong to the same concept. The specific implementation process can be found in the method embodiments, and will not be described again here. .
图4是本申请一个实施例提供的基于图卷积神经网络的文本关系检测装置的框图。该装置至少包括处理器401和存储器402。Figure 4 is a block diagram of a text relationship detection device based on a graph convolutional neural network provided by an embodiment of the present application. The device includes at least a processor 401 and a memory 402.
处理器401可以包括一个或多个处理核心,比如:4核心处理器、8核心处理器等。处理器401可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器401也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central ProcessingUnit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器401可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器401还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。The processor 401 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The processor 401 can adopt at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), and PLA (Programmable Logic Array, programmable logic array). accomplish. The processor 401 may also include a main processor and a co-processor. The main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); the co-processor is A low-power processor used to process data in standby mode. In some embodiments, the processor 401 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is responsible for rendering and drawing content that needs to be displayed on the display screen. In some embodiments, the processor 401 may also include an AI (Artificial Intelligence, artificial intelligence) processor, which is used to process computing operations related to machine learning.
存储器402可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器402还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器402中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器401所执行以实现本申请中方法实施例提供的基于图卷积神经网络的文本关系检测方法。Memory 402 may include one or more computer-readable storage media, which may be non-transitory. Memory 402 may also include high-speed random access memory, and non-volatile memory, such as one or more disk storage devices, flash memory storage devices. In some embodiments, the non-transitory computer-readable storage medium in the memory 402 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 401 to implement the graph-based method provided by the method embodiments in this application. Text relationship detection method using convolutional neural networks.
在一些实施例中,基于图卷积神经网络的文本关系检测装置还可选包括有:外围设备接口和至少一个外围设备。处理器401、存储器402和外围设备接口之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口相连。示意性地,外围设备包括但不限于:射频电路、触摸显示屏、音频电路、和电源等。In some embodiments, the text relationship detection apparatus based on graph convolutional neural network optionally further includes: a peripheral device interface and at least one peripheral device. The processor 401, the memory 402 and the peripheral device interface may be connected through a bus or a signal line. Each peripheral device can be connected to the peripheral device interface through a bus, a signal line or a circuit board. Illustratively, peripheral devices include but are not limited to: radio frequency circuits, touch display screens, audio circuits, power supplies, etc.
当然,基于图卷积神经网络的文本关系检测装置还可以包括更少或更多的组件,本实施例对此不作限定。Of course, the text relationship detection device based on graph convolutional neural network may also include fewer or more components, which is not limited in this embodiment.
可选地,本申请还提供有一种计算机可读存储介质,所述计算机可读存储介质中存储有程序,所述程序由处理器加载并执行以实现上述方法实施例的基于图卷积神经网络的文本关系检测方法。Optionally, this application also provides a computer-readable storage medium in which a program is stored, and the program is loaded and executed by a processor to implement the graph convolutional neural network based on the above method embodiment. Text relationship detection method.
可选地,本申请还提供有一种计算机产品,该计算机产品包括计算机可读存储介质,所述计算机可读存储介质中存储有程序,所述程序由处理器加载并执行以实现上述方法实施例的基于图卷积神经网络的文本关系检测方法。Optionally, this application also provides a computer product. The computer product includes a computer-readable storage medium. A program is stored in the computer-readable storage medium. The program is loaded and executed by a processor to implement the above method embodiments. Text relationship detection method based on graph convolutional neural network.
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above-described embodiments can be combined in any way. To simplify the description, not all possible combinations of the technical features in the above-described embodiments are described. However, as long as there is no contradiction in the combination of these technical features, All should be considered to be within the scope of this manual.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-described embodiments only express several implementation modes of the present application, and their descriptions are relatively specific and detailed, but they should not be construed as limiting the scope of the invention patent. It should be noted that, for those of ordinary skill in the art, several modifications and improvements can be made without departing from the concept of the present application, and these all fall within the protection scope of the present application. Therefore, the protection scope of this patent application should be determined by the appended claims.
上述仅为本申请的一个具体实施方式,其它基于本申请构思的前提下做出的任何改进都视为本申请的保护范围。The above is only a specific embodiment of the present application, and any other improvements made based on the concept of the present application shall be deemed to be within the protection scope of the present application.
Claims (8)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110224515.XA CN112949476B (en) | 2021-03-01 | 2021-03-01 | Text relation detection method, device and storage medium based on graph convolution neural network |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110224515.XA CN112949476B (en) | 2021-03-01 | 2021-03-01 | Text relation detection method, device and storage medium based on graph convolution neural network |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN112949476A CN112949476A (en) | 2021-06-11 |
| CN112949476B true CN112949476B (en) | 2023-09-29 |
Family
ID=76246856
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202110224515.XA Active CN112949476B (en) | 2021-03-01 | 2021-03-01 | Text relation detection method, device and storage medium based on graph convolution neural network |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN112949476B (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20240177514A1 (en) * | 2022-11-29 | 2024-05-30 | Microsoft Technology Licensing, Llc | Learning a form structure |
Families Citing this family (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111027563B (en) * | 2019-12-09 | 2024-12-24 | 腾讯云计算(北京)有限责任公司 | A text detection method, device and recognition system |
| CN114153940B (en) * | 2021-10-29 | 2025-09-09 | 北京搜狗科技发展有限公司 | Text matching method, device, electronic equipment, medium and program product |
| CN114037985A (en) * | 2021-11-04 | 2022-02-11 | 北京有竹居网络技术有限公司 | Information extraction method, device, equipment, medium and product |
| CN114153959B (en) * | 2021-12-08 | 2025-06-03 | 北京有竹居网络技术有限公司 | Key value matching method, device, readable medium and electronic device |
| CN114283403B (en) * | 2021-12-24 | 2024-01-16 | 北京有竹居网络技术有限公司 | An image detection method, device, storage medium and equipment |
| CN114219876B (en) * | 2022-02-18 | 2022-06-24 | 阿里巴巴达摩院(杭州)科技有限公司 | Text merging method, device, equipment and storage medium |
| CN114842492B (en) * | 2022-04-29 | 2025-10-21 | 北京鼎事兴软件科技有限公司 | Key information extraction method, device, storage medium and electronic device |
| CN114782943B (en) * | 2022-05-13 | 2025-01-14 | 广州欢聚时代信息科技有限公司 | Method for extracting bill information and its device, equipment, medium and product |
| CN115116060B (en) * | 2022-08-25 | 2023-01-24 | 深圳前海环融联易信息科技服务有限公司 | Key value file processing method, device, equipment and medium |
| CN116403038A (en) * | 2023-03-31 | 2023-07-07 | 阿里巴巴(中国)有限公司 | Data relationship identification method and data processing method for data relationship identification |
| CN118643801A (en) * | 2024-08-15 | 2024-09-13 | 深圳市智慧城市科技发展集团有限公司 | Method, device and readable storage medium for converting presentation content |
Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CA2185827A1 (en) * | 1991-12-23 | 1993-06-24 | Chinmoy Bhusan Bose | Method and Apparatus for Connected and Degraded Text Recognition |
| CN109062874A (en) * | 2018-06-12 | 2018-12-21 | 平安科技(深圳)有限公司 | Acquisition methods, terminal device and the medium of financial data |
| CN110825845A (en) * | 2019-10-23 | 2020-02-21 | 中南大学 | A Hierarchical Text Classification Method Based on Character and Self-Attention Mechanism and Chinese Text Classification Method |
| CN111553837A (en) * | 2020-04-28 | 2020-08-18 | 武汉理工大学 | An Artistic Text Image Generation Method Based on Neural Style Transfer |
| CN111581377A (en) * | 2020-04-23 | 2020-08-25 | 广东博智林机器人有限公司 | Text classification method and device, storage medium and computer equipment |
| CN111597943A (en) * | 2020-05-08 | 2020-08-28 | 杭州火石数智科技有限公司 | Table structure identification method based on graph neural network |
| CN111652162A (en) * | 2020-06-08 | 2020-09-11 | 成都知识视觉科技有限公司 | Text detection and identification method for medical document structured knowledge extraction |
| CN111784802A (en) * | 2020-07-30 | 2020-10-16 | 支付宝(杭州)信息技术有限公司 | Image generation method, device and equipment |
| CN111860257A (en) * | 2020-07-10 | 2020-10-30 | 上海交通大学 | A table recognition method and system integrating various text features and geometric information |
| CN111967387A (en) * | 2020-08-17 | 2020-11-20 | 北京市商汤科技开发有限公司 | Form recognition method, device, equipment and computer readable storage medium |
| CN112215236A (en) * | 2020-10-21 | 2021-01-12 | 科大讯飞股份有限公司 | Text recognition method and device, electronic equipment and storage medium |
| CN112241481A (en) * | 2020-10-09 | 2021-01-19 | 中国人民解放军国防科技大学 | Cross-modal news event classification method and system based on graph neural network |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10839291B2 (en) * | 2017-07-01 | 2020-11-17 | Intel Corporation | Hardened deep neural networks through training from adversarial misclassified data |
| KR102535411B1 (en) * | 2017-11-16 | 2023-05-23 | 삼성전자주식회사 | Apparatus and method related to metric learning based data classification |
-
2021
- 2021-03-01 CN CN202110224515.XA patent/CN112949476B/en active Active
Patent Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CA2185827A1 (en) * | 1991-12-23 | 1993-06-24 | Chinmoy Bhusan Bose | Method and Apparatus for Connected and Degraded Text Recognition |
| CN109062874A (en) * | 2018-06-12 | 2018-12-21 | 平安科技(深圳)有限公司 | Acquisition methods, terminal device and the medium of financial data |
| CN110825845A (en) * | 2019-10-23 | 2020-02-21 | 中南大学 | A Hierarchical Text Classification Method Based on Character and Self-Attention Mechanism and Chinese Text Classification Method |
| CN111581377A (en) * | 2020-04-23 | 2020-08-25 | 广东博智林机器人有限公司 | Text classification method and device, storage medium and computer equipment |
| CN111553837A (en) * | 2020-04-28 | 2020-08-18 | 武汉理工大学 | An Artistic Text Image Generation Method Based on Neural Style Transfer |
| CN111597943A (en) * | 2020-05-08 | 2020-08-28 | 杭州火石数智科技有限公司 | Table structure identification method based on graph neural network |
| CN111652162A (en) * | 2020-06-08 | 2020-09-11 | 成都知识视觉科技有限公司 | Text detection and identification method for medical document structured knowledge extraction |
| CN111860257A (en) * | 2020-07-10 | 2020-10-30 | 上海交通大学 | A table recognition method and system integrating various text features and geometric information |
| CN111784802A (en) * | 2020-07-30 | 2020-10-16 | 支付宝(杭州)信息技术有限公司 | Image generation method, device and equipment |
| CN111967387A (en) * | 2020-08-17 | 2020-11-20 | 北京市商汤科技开发有限公司 | Form recognition method, device, equipment and computer readable storage medium |
| CN112241481A (en) * | 2020-10-09 | 2021-01-19 | 中国人民解放军国防科技大学 | Cross-modal news event classification method and system based on graph neural network |
| CN112215236A (en) * | 2020-10-21 | 2021-01-12 | 科大讯飞股份有限公司 | Text recognition method and device, electronic equipment and storage medium |
Non-Patent Citations (3)
| Title |
|---|
| 基于Bert模型的框架类型检测方法;高李政;周刚;罗军勇;黄永忠;;信息工程大学学报(第02期);全文 * |
| 基于图像的信息隐藏检测算法和实现技术研究综述;夏煜, 郎荣玲, 曹卫兵, 戴冠中;计算机研究与发展(第04期);全文 * |
| 家谱文本中实体关系提取方法研究;任明;许光;王文祥;;中文信息学报(第06期);全文 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20240177514A1 (en) * | 2022-11-29 | 2024-05-30 | Microsoft Technology Licensing, Llc | Learning a form structure |
Also Published As
| Publication number | Publication date |
|---|---|
| CN112949476A (en) | 2021-06-11 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112949476B (en) | Text relation detection method, device and storage medium based on graph convolution neural network | |
| JP7289047B2 (en) | Method, computer program and system for block-based document metadata extraction | |
| US12118813B2 (en) | Continuous learning for document processing and analysis | |
| CN107004159B (en) | Active machine learning | |
| CN111680491B (en) | Method and device for extracting document information and electronic equipment | |
| US12118816B2 (en) | Continuous learning for document processing and analysis | |
| CN111626048A (en) | Text error correction method, device, equipment and storage medium | |
| CN114780746A (en) | Knowledge graph-based document retrieval method and related equipment thereof | |
| US20220139098A1 (en) | Identification of blocks of associated words in documents with complex structures | |
| CN112307749B (en) | Text error detection method, text error detection device, computer equipment and storage medium | |
| CN115917613A (en) | Semantic representation of text in a document | |
| CN115130989A (en) | Method, device and equipment for auditing service document and storage medium | |
| CN113360654B (en) | Text classification method, apparatus, electronic device and readable storage medium | |
| CN112949477A (en) | Information identification method and device based on graph convolution neural network and storage medium | |
| CN114724156B (en) | Form identification method and device and electronic equipment | |
| Ali et al. | Sentiment Analysis of Low-Resource Language Literature Using Data Processing and Deep Learning. | |
| CN113591881B (en) | Intention recognition method and device based on model fusion, electronic equipment and medium | |
| CN111488732B (en) | A method, system and related equipment for detecting deformed keywords | |
| CN115455171A (en) | Method, device, equipment and medium for mutual retrieval and model training of text videos | |
| CN114495113A (en) | Text classification method and training method and device for text classification model | |
| CN117633245B (en) | Knowledge graph construction method, device, electronic device and storage medium | |
| US9875336B2 (en) | Spatial arithmetic method of sequence alignment | |
| CN115410185B (en) | A method for extracting specific person and organization names from multimodal data | |
| CN112632948A (en) | Case document ordering method and related equipment | |
| CN115129871B (en) | Text category determining method, apparatus, computer device and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| TR01 | Transfer of patent right |
Effective date of registration: 20250703 Address after: 210000 Jiangsu Province, Nanjing City, Qilin Science and Technology Innovation Park, Tianjiao Road 100, Room 501-2, Building B, Jiangsu Nanjing Qiaomeng Yuan Patentee after: Yunwan Intelligent Computing (Nanjing) Technology Co.,Ltd. Country or region after: China Address before: 215123 Jiangsu Province Suzhou City Suzhou Industrial Park Jinji Lake Avenue 88 Phase 7 G1-902 Unit Patentee before: Suzhou meinenghua Intelligent Technology Co.,Ltd. Country or region before: China |
|
| TR01 | Transfer of patent right |