CN112949476A

CN112949476A - Text relation detection method and device based on graph convolution neural network and storage medium

Info

Publication number: CN112949476A
Application number: CN202110224515.XA
Authority: CN
Inventors: 熊玉竹; 侯绍东; 周以晴
Original assignee: Suzhou Meinenghua Intelligent Technology Co ltd
Current assignee: Yunwan Intelligent Computing (Nanjing) Technology Co.,Ltd.
Priority date: 2021-03-01
Filing date: 2021-03-01
Publication date: 2021-06-11
Anticipated expiration: 2041-03-01
Also published as: CN112949476B

Abstract

The present application relates to a text relationship detection method, device and storage medium based on a graph convolutional neural network, belonging to the field of computer technology. The method includes: acquiring a plurality of key information blocks of text information in a target image, each of the key information blocks Each text block includes at least one character string; input the character string of each text block in each key information block into the node feature extraction model to obtain the node features of the key information block; construct each text block in the key information block and other key information blocks The connectivity relationship between each text block in the information block; based on each connectivity relationship corresponding to each key information block and the position information corresponding to each connectivity relationship, determine the edge features of the key information blocks; input the node features and edge features in advance. The trained graph convolutional neural network can obtain the edge types between each key information block; it is determined that the key information blocks with the same edge type have an association relationship; the accuracy and efficiency of association relationship identification can be improved.

Description

Text relation detection method and device based on graph convolution neural network and storage medium

[ technical field ] A method for producing a semiconductor device

The application relates to a text relation detection method and device based on a graph convolution neural network and a storage medium, and belongs to the technical field of computers.

[ background of the invention ]

Text relationship detection is a common requirement in the field of natural language processing. Briefly, text relationship detection is to identify entities of interest contained in a document and then classify the identified entities by relationship type, such as: key information and the incidence relation among the information are extracted from documents such as bills and logistics sheets, so that the bill information can be structured, and the working efficiency of employees can be improved.

Starting from original files such as bills and logistics lists, Character characters and Character position information are recognized by using Optical Character Recognition (OCR), and characters are aggregated according to a distance threshold to obtain text block nodes. And then, structuring the document information through two links, wherein the first link is to extract key information blocks by using a document information extraction model and taking the aggregated text block nodes as input, and the second link is to detect the relationship between key information files by taking the key information blocks as input.

After extracting the key information blocks, the most common method at present is to compile a logic rule based on the position information of the key information blocks, and judge the association relationship between the key information blocks in the horizontal and vertical directions by a manually set distance threshold. The other method is to distinguish keys and values of the key information blocks, determine whether the keys and the values are matched with each other or not, and detect the association of the key value pairs by constructing a deep learning model so as to obtain the association of the key information blocks.

However, the method for judging the association relationship between the key information blocks by the logic rule based on the position information is rough, the selection of the distance threshold depends on manual experience and is strongly related to the sample, the proper thresholds of different files are different, and the condition that the detection result of the relationship of partial files is unreasonable exists, such as files which do not conform to the normal page layout format.

The information in the normal file appears according to lines, the lines are divided, and the texts in the same line can be considered to have a simplest association relationship, namely line association. However, the document which is not in accordance with the normal page layout may have a part of text content which is too long or a position offset, so that the text of the document appears in other lines, and the text is wrongly divided into other lines according to a distance threshold, so that the association detection is wrong. In many cases, the requirements on the format of page typesetting are not strict enough in document documents, and documents with a certain proportion have no table lines and no obvious separation between rows and columns, so that the conditions of wrong rows and staggered columns are common.

In the other method, keys and values of key information blocks are firstly distinguished, and then the key-value pair relationship is detected, the existing known relationship detection types are matched and unmatched, in the actual service use scene, the relationship types among texts are various, and only the detection of whether the matched relationship is not flexible enough can not meet the actual use. Because the method increases the constraint limit of the relationship between key values, the detection of the text relationship under the condition that all the values in one row or one column are not supported.

[ summary of the invention ]

The application provides a text relation detection method, a text relation detection device and a storage medium based on a graph convolution neural network, which can solve the problems that the detection result is unreasonable and the detection mode is not flexible enough due to the fact that a logic rule based on position information is set to determine the association relation. The application provides the following technical scheme:

in a first aspect, a method for detecting a text relation based on a graph convolution neural network is provided, the method including:

acquiring a plurality of key information blocks of text information in a target image, wherein the key information blocks comprise a plurality of text blocks, and each text block comprises at least one character string;

inputting the character string of each text block in each key information block into a node feature extraction model to obtain the node features of the key information blocks;

for each key information block in the plurality of key information blocks, constructing a connection relation between each text block in the key information blocks and each text block in other key information blocks;

determining the edge characteristics of each key information block based on each communication relation corresponding to each key information block and the position information corresponding to each communication relation;

inputting the node features and the edge features into a graph convolution neural network trained in advance to obtain edge types among all key information blocks;

and determining that the key information blocks with the same edge type have an association relation.

Optionally, the determining the edge feature of the key information block based on each connectivity corresponding to each key information block and the location information corresponding to each connectivity includes:

for each connected relation, determining the sub-edge characteristics corresponding to the connected relation according to the relative position between two text blocks connected by the connected relation;

for each key information block, determining that each text block in the key information block is communicated with a corresponding sub-edge feature;

and generating the edge characteristics based on the corresponding sub-edge characteristics of each key information block.

Optionally, for each connectivity relationship, determining a sub-edge feature corresponding to the connectivity relationship according to a relative position between two text blocks connected by the connectivity relationship, includes:

discretizing the relative position according to the direction and the distance to obtain a direction code and a distance code;

inputting the direction codes and the distance codes into an embedding model to obtain direction embedding codes, horizontal distance embedding codes and vertical distance embedding codes;

and after splicing the direction embedded code, the horizontal distance embedded code and the vertical distance embedded code, projecting to obtain a vector with a fixed length, and obtaining the sub-edge characteristics.

Optionally, the generating the edge feature based on the sub-edge feature corresponding to each key information block includes:

processing each sub-edge feature into the same dimension;

and processing each processed sub-edge feature into a first fixed dimension to obtain the edge feature.

for each key information block, acquiring a connection relation matching table formed by connection relations of the key information block;

and according to the connection relation matching table, searching a corresponding sub-edge feature vector set from a vector table formed by the sub-edge features to obtain the edge features of the key information block.

Optionally, the inputting the character string of each text block in each key information block into a node feature extraction model to obtain the node feature of the key information block includes:

for each text block, inputting the character strings in the text block into a pre-trained Recurrent Neural Network (RNN) to obtain a feature vector of each character string;

and processing the feature vector of each character string into a second fixed dimension to obtain the node feature.

Optionally, the inputting the node features and the edge features into a pre-trained graph convolution neural network to obtain edge types between the key information blocks includes:

for each key information block, calculating target node information by the graph convolution neural network according to the node characteristics of the key information block and the node characteristics and the edge characteristics which have a communication relation with the key information block;

and splicing the information of each node associated with the edge, and calculating the attribute of the edge through a multi-layer forward network to obtain the edge type.

In a second aspect, an apparatus for detecting text relation based on a graph convolution neural network is provided, the apparatus comprising:

the key information acquisition module is used for acquiring a plurality of key information blocks of text information in a target image, wherein each key information block comprises a plurality of text blocks, and each text block comprises at least one character string;

the node feature extraction module is used for inputting the character string of each text block in each key information block into a node feature extraction model to obtain the node features of the key information blocks;

the connected relation building module is used for building the connected relation between each text block in the key information blocks and each text block in other key information blocks for each key information block in the key information blocks;

the side feature extraction module is used for determining the side features of the key information blocks based on the communication relations corresponding to the key information blocks and the position information corresponding to the communication relations;

the edge type calculation module is used for inputting the node characteristics and the edge characteristics into a graph convolution neural network trained in advance to obtain edge types among all key information blocks;

and the incidence relation determining module is used for determining that the key information blocks with the same edge type have incidence relation.

In a third aspect, a text relation detection apparatus based on a graph convolution neural network is provided, the apparatus includes a processor and a memory; the memory stores a program, and the program is loaded and executed by the processor to implement the method for detecting text relationship based on a graph-convolution neural network according to the first aspect.

In a fourth aspect, a computer-readable storage medium is provided, in which a program is stored, and the program is loaded and executed by the processor to implement the method for detecting text relationship based on a atlas neural network according to the first aspect.

The beneficial effect of this application lies in: the method comprises the steps that a plurality of key information blocks of text information in a target image are obtained, wherein each key information block comprises a plurality of text blocks, and each text block comprises at least one character string; inputting the character string of each text block in each key information block into a node feature extraction model to obtain the node features of the key information blocks; for each key information block in the plurality of key information blocks, constructing a communication relation between each text block in the key information block and each text block in other key information blocks; determining the edge characteristics of the key information blocks based on the communication relations corresponding to the key information blocks and the position information corresponding to the communication relations; inputting the node characteristics and the edge characteristics into a graph convolution neural network trained in advance to obtain edge types among all key information blocks; determining that key information blocks with the same edge type have an association relation; the problem that the detection result is unreasonable due to the fact that the association relation is determined by setting the logic rule based on the position information can be solved, and the accuracy of identification of the association relation is improved. Meanwhile, the relation detection method provided by the application does not need to distinguish keys and values of the key information blocks, and directly detects the incidence relation, so that the incidence relation detection efficiency can be improved.

The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical solutions of the present application more clear and clear, and to implement the technical solutions according to the content of the description, the following detailed description is made with reference to the preferred embodiments of the present application and the accompanying drawings.

[ description of the drawings ]

FIG. 1 is a flowchart of a graph convolution neural network-based text relation detection method according to an embodiment of the present application;

FIG. 2 is a flowchart of a graph convolution neural network-based text relation detection method according to another embodiment of the present application;

FIG. 3 is a block diagram of a graph convolution neural network-based text relation detection apparatus according to an embodiment of the present application;

fig. 4 is a block diagram of a text relation detection apparatus based on a graph convolution neural network according to an embodiment of the present application.

[ detailed description ] embodiments

The following detailed description of embodiments of the present application will be described in conjunction with the accompanying drawings and examples. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.

First, several terms referred to in the present application will be described.

Optical Character Recognition (OCR): the method is a recognition technology for converting information in an image into characters.

Text block nodes: the text block segmented by a certain threshold value comprises text content, a text position and a related picture background.

Key information block: the document file is composed of a group of text block nodes of existence types and is valuable information in the document file, such as price, weight and the like.

Node characteristics: the characteristic information of the key information block is obtained by coding from the text content and the type of the key information block.

Edge characteristics: the characteristic information of the connected edges among the key information blocks is referred.

Sub-edge characteristics: the method refers to the characteristic information of edges communicated among nodes of a text block, and the characteristic information of edge connection is obtained by coding from the position information of the nodes.

An information extraction model: the method is characterized in that a model for extracting key information blocks in a text file is input by using text block nodes, and a graph convolution network is a core component of the model.

Recurrent Neural Network (RNN): the neural network structure is a special neural network structure and consists of an input layer, a hidden layer and an output layer.

Summary model: an artificially designed neural network structure takes a group of feature vectors as input, and outputs a fixed-dimension vector to represent semantic information of the group of feature vectors.

Graph Convolutional neural Network (GCN): the method is a neural Network which is applied to a Graph by adopting Graph convolution and can be applied to Graph Embedding (GE).

Graph G ═ V, E, where V is the set of nodes and E is the set of edges, and for each node i, there is its characteristic x_iCan use matrix X_N*DAnd (4) showing. Where N represents the number of nodes, and D represents the number of features of each node, or the dimension of a feature vector.

Graph convolution refers to a process of determining a feature representation of a current node by nodes surrounding the current node. The peripheral nodes may be neighbor nodes of the current node, that is, nodes (or called nodes) having an association relationship with the current node, or neighbor nodes of the current node, and the like, and the type of the peripheral nodes is not limited in the present application.

Graph convolution can be represented by the following nonlinear function:

H^l+1＝f(H^l，A)

wherein H⁰X is the input of the first layer, X belongs to R^N*DN is the number of nodes of the graph, D is the dimension of each node feature vector, A is an adjacent matrix, and the functions f of the convolutional neural networks of different graphs are the same or different.

Optionally, in the present application, an execution subject of each embodiment is taken as an example of an electronic device with computing capability, the electronic device may be a terminal or a server, the terminal may be a computer, a notebook computer, a tablet computer, and the like, and the embodiment does not limit the type of the terminal and the type of the electronic device.

The text relation detection method provided by the embodiment is suitable for identifying the association relation between the key information blocks in the text file. Each kind of key information block corresponds to a named entity, for example, a bill file includes four kinds of key information blocks, which are: the product country of origin, the product quantity, the product unit price and the product total price, the quantity of each key information block is multiple, and the embodiment can detect the key information blocks with the association relationship (such as belonging to the same product).

In practical application, the key information block may be a key information block in various documents such as value-added tax invoice and policy, and may also be a key information block in other types of files, such as a key information block in a certificate image, and the like.

Therefore, if the key information blocks include a plurality of types and the number of each type of key information block is multiple, how to accurately identify the key information blocks with the association relationship becomes a problem to be solved urgently. Based on the above, the text relation detection scheme provided by the application takes the key information block in the text file (such as a document file, a certificate file, and the like) as input, and the key information block is composed of a group of text block nodes with existing types; constructing an undirected graph by taking the key information blocks as nodes and the communication condition among the key information blocks as edges, and coding the learning characteristics and predicting the type of the edge connection relation by using a graph neural network; the nodes and edges of the input graph neural network need to extract corresponding feature information from the key information blocks, the node features are extracted from the key information blocks, and the edge features are extracted from the edge communication relation among the key information blocks; then, inputting the node characteristics and the edge characteristics into a constructed graph neural network architecture to encode and learn edge relation characteristic vectors; and finally, performing edge type prediction according to the edge relation feature vector and aggregating to obtain the incidence relation among the key information blocks.

Because the incidence relation can be automatically detected according to the characteristics and the edge characteristics of the key information block, the problem that the detection result is unreasonable due to the fact that the incidence relation is determined by setting a logic rule based on the position information can be solved, and the accuracy of incidence relation identification is improved.

In addition, the relation detection method provided by the application does not need to distinguish keys and values of the key information blocks, directly detects the incidence relation, and can improve the efficiency of incidence relation detection.

The text relation detection method based on the graph convolution neural network provided by the application is described in detail below.

Fig. 1 is a flowchart of a text relation detection method based on a graph convolution neural network according to an embodiment of the present application. The method at least comprises the following steps:

step 101, obtaining a plurality of key information blocks of text information in a target image, wherein the key information blocks comprise a plurality of text blocks, and each text block comprises at least one character string.

The key information block is information to be extracted from text information of the target image, or information concerned by the user.

Alternatively, the key information blocks may be identified by the electronic device, for example, using an information extraction model stored inside the electronic device, and using the text block nodes as input to extract a model of the key information blocks in the text file; or sent by other devices, the embodiment does not limit the manner of obtaining the key information block.

In this embodiment, the key information block is composed of a group of text blocks of the existence type; the text block contains two kinds of information, one is the text content of the text block, and the other is the position information of the text block.

Optionally, the location information is a text box identifier consisting of an upper left corner and a lower right corner of the text block. Alternatively, the position information may be each pixel position corresponding to the text block, and the embodiment does not limit the implementation manner of the position information.

Optionally, the electronic device may further perform data format conversion on the key information block, so that the converted data format is suitable for the subsequent steps. Data format conversions include, but are not limited to: the key information block and the text block node are associated with each other, and the text block node is formatted according to the coordinates.

And 102, inputting the character strings of each text block in each key information block into a node feature extraction model to obtain the node features of the key information blocks.

In this embodiment, the node characteristics of the key information block are extracted and obtained using the type information of the key information block and the corresponding group of text blocks.

In one example, inputting the character string of each text block in each key information block into a node feature extraction model to obtain the node features of the key information block includes: for each text block, inputting the character strings in the text block into a pre-trained RNN to obtain a feature vector of each character string; and processing the feature vector of each character string into a second fixed dimension to obtain node features.

Specifically, for a character string corresponding to a group of text blocks of the input node feature extraction model, firstly, the character string is subjected to vectorization representation of characters according to characters, then, a character vector is encoded through RNN, and then, the character vector is weighted and superposed into a feature vector with fixed dimensionality by using a Summary model, so that the feature vector represents the text features of the text block nodes.

The above example is described by taking the node feature extraction model including RNN and Summary models as an example, and in practical implementation, the node feature extraction model may also be other types of neural networks, such as: the feature vector of the character string is obtained through calculation of a linear regression model, or the feature vector is calculated by using a word2vector, and the implementation manner of the node feature extraction model is not limited in this embodiment.

Optionally, the electronic device randomly initializes a vector table of a node type in advance, and searches for a vector in the vector table through an index subscript corresponding to the node type of the key information block, so that the vector represents the type characteristic of the key information block. And then, expanding the type features into the same dimension of the text features and splicing the same dimension with the text features into a group of new feature vectors, and encoding and extracting the feature vectors by using an RNN and summery model to obtain a feature vector with a fixed dimension, wherein the extracted feature vector represents the node features of the key information block.

And 103, constructing a connection relation between each text block in the key information blocks and each text block in other key information blocks for each key information block in the plurality of key information blocks.

The key information block is composed of a group of text blocks, and there is a plurality of text position information. One strategy is to construct a location area large enough to cover all text block nodes, but the description of this location area is not accurate enough and may overlap with the location areas of other key informational nodes. Based on this, in this embodiment, the connection relationship between two key information blocks is defined as the sum of the relationship between two groups of text block nodes. That is, the first key information block contains M text blocks, and the second key information block contains N text blocks, then M × N text block connectivity exists between the two key information blocks, and the sum of these connectivity represents the connectivity between the two key information nodes, where M and N are positive integers greater than or equal to 1.

And 104, determining the edge characteristics of the key information blocks based on the communication relations corresponding to the key information blocks and the position information corresponding to the communication relations.

Alternatively, steps 103 and 104 may be executed after step 102, or may also be executed before step 102, or may also be executed simultaneously with step 102, and the execution order between steps 103 and 104 and step 102 is not limited in this embodiment.

In one example, determining the edge characteristics of the key information block based on the respective connectivity corresponding to each key information block and the location information corresponding to each connectivity includes: for each connected relation, determining the sub-edge characteristics corresponding to the connected relation according to the relative position between the two text blocks connected in the connected relation; for each key information block, determining that each text block in the key information block is communicated with the corresponding sub-edge characteristic; and generating an edge feature based on the corresponding sub-edge feature of each key information block.

For each connected relation, determining the sub-edge characteristics corresponding to the connected relation according to the relative position between two text blocks connected by the connected relation, wherein the method comprises the following steps: discretizing the relative position according to the direction and the distance to obtain a direction code and a distance code; inputting the direction codes and the distance codes into an embedding model to obtain direction embedding codes, horizontal distance embedding codes and vertical distance embedding codes; and after splicing the direction embedded code, the horizontal distance embedded code and the vertical distance embedded code, projecting to obtain a vector with a fixed length and obtaining the sub-edge characteristics.

The embedded model may be a pre-trained embedding layer.

Such as: for the relative position (center line vector) of two text blocks with connected relation, discretizing according to direction and distance, wherein the discretizing refers to the following steps: the direction is divided into a plurality of directions according to angles (for example, 360 directions, the difference between two adjacent directions is 1 degree), the vertical distance and the horizontal distance are divided by the length and the width of the target image on the distance, normalized vertical distance and normalized horizontal distance are obtained, and then the normalized vertical distance and the normalized horizontal distance are multiplied by 1000 and are rounded. In this way, directional integer coding and distance integer coding are obtained. And calculating corresponding embedded codes of the directional, horizontal and vertical integer codes through an embedded model to obtain three embedded codes of the directional, horizontal and vertical. And splicing and projecting the embedded codes to vectors with fixed lengths as edge features of the directed graph.

Optionally, in the two steps of the edge feature extraction process, the sub-edge features between the text blocks are extracted first, and then the edge features between the key information blocks are searched from the vector table of the sub-edge features and are obtained by encoding and extracting.

The method for searching, coding and extracting the edge features from the vector table of the sub-edge features comprises the following steps: for each key information block, acquiring a connection relation matching table formed by connection relations of the key information blocks; and according to the connection relation matching table, searching a corresponding sub-edge feature vector set from a vector table formed by the sub-edge features to obtain the edge features of the key information block.

Optionally, after the sub-edge feature vector set is obtained, the electronic device processes each sub-edge feature into the same dimension; and processing each processed sub-edge feature into a first fixed dimension to obtain an edge feature.

The following describes an example of determining the edge characteristics of a key information block:

1) calculating the characteristic vectors of the sub-edges among the text blocks, and constructing a vector table consisting of the characteristic vectors of the sub-edges from the connection of the sub-edges among the text blocks.

2) Searching a sub-edge connection condition matching table among the key information blocks in a sub-edge feature vector table to obtain vectorization expression;

3) and expanding the vectorized sub-edge feature set between the key information blocks into the same dimension, and processing the key information blocks into a feature vector of a first fixed dimension through a Summary model, so that the edge feature of the key information blocks is represented by the vector.

And 105, inputting the node characteristics and the edge characteristics into a pre-trained graph convolution neural network to obtain edge types among all key information blocks.

The input of the graph convolution neural network is node characteristics and edge characteristics, and the process of calculating the edge type comprises the following steps: for each key information block, calculating target node information by using node characteristics of the key information block and node characteristics and edge characteristics which have a communication relation with the key information block through a graph convolution neural network; and splicing the information of each node related to the edge, and calculating the attribute of the edge through a multi-layer forward network to obtain the edge type.

The process of calculating the target node information and calculating the edge attributes is repeated for multiple times to expand the field of view and obtain high-level semantic features.

And 106, determining that the key information blocks with the same edge type have an association relation.

The key information blocks are aggregated together into a group according to the connection of the same edge type, the group indicates that a certain incidence relation exists among the key information blocks, and the incidence relation is the detected edge type. And aggregating the key information blocks and the edge type prediction results into a group of node relation sets with types, wherein the relation set is a structured information extraction result, namely the relation between texts needing to be detected.

In order to more clearly understand the text relation detection method based on the graph convolution neural network provided by the present application, referring to fig. 2, the text relation detection method is exemplified by an example below. The key information block in the document file is taken as an example for explanation in fig. 2. Extracting text characteristics of the text contents in the key information blocks, and coding the type characteristics of the key information blocks; combining and coding the text characteristics and the type characteristics of the key information block to obtain node characteristics; all the key information blocks are disassembled to obtain a candidate text block set, and the sub-edge features between every two text blocks are extracted according to the position information of the text blocks and used for constructing a sub-edge feature vector table; the edge characteristics between every two key information blocks are regarded as aggregation of sub-edge characteristics between two groups of corresponding text blocks, namely each edge characteristic is composed of a group of sub-edge characteristics, subscript indexes corresponding to the group of sub-edges can be searched and obtained in a sub-edge characteristic vector table, and after a group of sub-edge characteristic vectors corresponding to the edge characteristics are obtained, the edge characteristics between the key information blocks are obtained through coding extraction; the feature vectors of the edge relation are learned by the feature of the key information blocks and the edge feature between the key information blocks through the constructed graph neural network layer codes; and detecting the type of edge communication among the key information blocks by using the feature vectors of the edge relation, and aggregating according to the edge type to obtain a group of key information block sets with the type.

Optionally, the steps 101-105 may be implemented in the same network model, where the input of the network model is the key information block and the output is the edge type.

In summary, in the text relationship detection method based on the graph convolution neural network provided by this embodiment, a plurality of key information blocks of text information in a target image are obtained, where a key information block includes a plurality of text blocks, and each text block includes at least one character string; inputting the character string of each text block in each key information block into a node feature extraction model to obtain the node features of the key information blocks; for each key information block in the plurality of key information blocks, constructing a communication relation between each text block in the key information block and each text block in other key information blocks; determining the edge characteristics of the key information blocks based on the communication relations corresponding to the key information blocks and the position information corresponding to the communication relations; inputting the node characteristics and the edge characteristics into a graph convolution neural network trained in advance to obtain edge types among all key information blocks; determining that key information blocks with the same edge type have an association relation; the problem that the detection result is unreasonable due to the fact that the association relation is determined by setting the logic rule based on the position information can be solved, and the accuracy of identification of the association relation is improved. Meanwhile, the relation detection method provided by the application does not need to distinguish keys and values of the key information blocks, and directly detects the incidence relation, so that the incidence relation detection efficiency can be improved.

Fig. 3 is a block diagram of a text relation detection apparatus based on a graph convolution neural network according to an embodiment of the present application. The device at least comprises the following modules: the system comprises a key information acquisition module 310, a node feature extraction module 320, a connectivity construction module 330, an edge feature extraction module 340, an edge type calculation module 350 and an incidence relation determination module 360.

A key information obtaining module 310, configured to obtain multiple key information blocks of text information in a target image, where a key information block includes multiple text blocks, and each text block includes at least one character string;

the node feature extraction module 320 is configured to input the character string of each text block in each key information block into a node feature extraction model to obtain a node feature of the key information block;

a connected relation constructing module 330, configured to construct, for each key information block in the plurality of key information blocks, a connected relation between each text block in the key information blocks and each text block in other key information blocks;

an edge feature extraction module 340, configured to determine an edge feature of each key information block based on each connectivity corresponding to the key information block and position information corresponding to each connectivity;

an edge type calculation module 350, configured to input the node features and the edge features into a graph convolution neural network trained in advance, so as to obtain edge types between the key information blocks;

and an association relation determining module 360, configured to determine that the key information blocks with the same edge type have an association relation.

For relevant details reference is made to the above-described method embodiments.

It should be noted that: in the text relationship detection device based on the convolutional neural network provided in the above embodiment, when performing text relationship detection based on the convolutional neural network, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules as needed, that is, the internal structure of the text relationship detection device based on the convolutional neural network is divided into different functional modules to complete all or part of the above described functions. In addition, the text relation detection device based on the graph convolution neural network provided by the above embodiment and the text relation detection method based on the graph convolution neural network belong to the same concept, and the specific implementation process thereof is detailed in the method embodiment and is not described herein again.

Fig. 4 is a block diagram of a text relation detection apparatus based on a graph convolution neural network according to an embodiment of the present application. The apparatus comprises at least a processor 401 and a memory 402.

Processor 401 may include one or more processing cores such as: 4 core processors, 8 core processors, etc. The processor 401 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 401 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 401 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, the processor 401 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 402 may include one or more computer-readable storage media, which may be non-transitory. Memory 402 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 402 is used to store at least one instruction for execution by processor 401 to implement the graph convolution neural network based text relationship detection method provided by method embodiments herein.

In some embodiments, the apparatus for detecting text relationship based on a convolutional neural network may further include: a peripheral interface and at least one peripheral. The processor 401, memory 402 and peripheral interface may be connected by bus or signal lines. Each peripheral may be connected to the peripheral interface via a bus, signal line, or circuit board. Illustratively, peripheral devices include, but are not limited to: radio frequency circuit, touch display screen, audio circuit, power supply, etc.

Of course, the text relation detection apparatus based on the graph convolution neural network may further include fewer or more components, which is not limited in this embodiment.

Optionally, the present application further provides a computer-readable storage medium, in which a program is stored, and the program is loaded and executed by a processor to implement the graph convolution neural network-based text relation detection method according to the above method embodiment.

Optionally, the present application further provides a computer product, which includes a computer-readable storage medium, where a program is stored in the computer-readable storage medium, and the program is loaded and executed by a processor to implement the graph convolution neural network-based text relation detection method according to the foregoing method embodiment.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

The above is only one specific embodiment of the present application, and any other modifications based on the concept of the present application are considered as the protection scope of the present application.

Claims

1. A text relation detection method based on a graph convolution neural network is characterized by comprising the following steps:

2. The method according to claim 1, wherein the determining the edge characteristic of each key information block based on the respective connectivity corresponding to each key information block and the location information corresponding to each connectivity comprises:

3. The method according to claim 2, wherein for each connectivity, determining the sub-edge feature corresponding to the connectivity according to the relative position between two text blocks connected by the connectivity comprises:

4. The method according to claim 2, wherein the generating the edge feature based on the sub-edge feature corresponding to each key information block comprises:

processing each sub-edge feature into the same dimension;

5. The method according to claim 2, wherein the generating the edge feature based on the sub-edge feature corresponding to each key information block comprises:

6. The method according to claim 1, wherein the inputting the character string of each text block in each key information block into a node feature extraction model to obtain the node feature of the key information block comprises:

7. The method of claim 1, wherein the inputting the node features and the edge features into a pre-trained graph convolution neural network to obtain edge types between each key information block comprises:

8. A graph-convolution neural network-based text relation detection apparatus, the apparatus comprising:

9. An apparatus for detecting text relation based on graph convolution neural network, the apparatus comprising a processor and a memory; the memory stores a program that is loaded and executed by the processor to implement the graph convolution neural network-based text relation detecting method according to any one of claims 1 to 7.

10. A computer-readable storage medium, in which a program is stored, which, when being executed by a processor, is configured to implement the method for detecting a text relationship based on a graph-convolution neural network according to any one of claims 1 to 7.