[go: up one dir, main page]

CN113283432B - Image recognition, text sorting method and device - Google Patents

Image recognition, text sorting method and device Download PDF

Info

Publication number
CN113283432B
CN113283432B CN202010106180.7A CN202010106180A CN113283432B CN 113283432 B CN113283432 B CN 113283432B CN 202010106180 A CN202010106180 A CN 202010106180A CN 113283432 B CN113283432 B CN 113283432B
Authority
CN
China
Prior art keywords
text information
sorted
text
ordered
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010106180.7A
Other languages
Chinese (zh)
Other versions
CN113283432A (en
Inventor
郑琪
于智
李亮城
高飞宇
王永攀
张建锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202010106180.7A priority Critical patent/CN113283432B/en
Publication of CN113283432A publication Critical patent/CN113283432A/en
Application granted granted Critical
Publication of CN113283432B publication Critical patent/CN113283432B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Discrimination (AREA)

Abstract

The embodiment of the application provides an image recognition and text ordering method and equipment. The method comprises the steps of identifying a plurality of pieces of text information to be ordered contained in an image to be identified, determining the reading sequence of the plurality of pieces of text information to be ordered according to the corresponding characteristics of the plurality of pieces of text information to be ordered, wherein the characteristics carry semantic characteristics, and ordering the plurality of pieces of text information to be ordered according to the reading sequence to obtain a text information sequence to be ordered. The sequencing method provided by the embodiment of the application is suitable for images in any text typesetting format, and has wide application range and good applicability.

Description

Image recognition and text ordering method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and apparatus for image recognition and text sorting.
Background
With the development of computer technology, image and text recognition technology has been developed, and by using the technology, the device can automatically recognize the text in the image.
In the prior art, a plurality of characters identified from an image are read and ordered from left to right and from top to bottom by default. While this simple reading ordering is only suitable for those pictures whose layout is simple, it fails for those pictures whose layout is complex (e.g., column, ring layout), because this simple reading order breaks the original semantic consistency.
It can be seen that the reading ordering method in the prior art has poor applicability or universality.
Disclosure of Invention
In view of the foregoing, the present application has been made to provide an image recognition, text sorting method and apparatus that solves or at least partially solves the foregoing problems.
Thus, in one embodiment of the present application, an image recognition method is provided. The method comprises the following steps:
identifying a plurality of text information to be sequenced from the image to be identified;
Determining the reading sequence of the plurality of text information to be ordered according to the corresponding characteristics of the plurality of text information to be ordered, wherein the characteristics carry semantic characteristics;
and sorting the plurality of text messages to be sorted according to the reading sequence to obtain a text message sequence to be sorted.
In yet another embodiment of the present application, a text ordering method is provided. The method comprises the following steps:
Acquiring a plurality of text information to be sequenced;
Synthesizing the respective corresponding characteristics of the plurality of word information to be ordered and the adjacent relation among the plurality of word information to be ordered, and determining the reading sequence of the plurality of word information to be ordered;
And sequencing the plurality of first text messages according to the reading sequence to obtain a first text message sequence.
In one embodiment of the present application, an image recognition method is provided. The method comprises the following steps:
identifying a plurality of text information to be sequenced from the image to be identified;
determining the character types of the plurality of character information to be ordered;
Acquiring an arrangement rule corresponding to the character type;
And sorting the plurality of word information to be sorted according to the arrangement rule to obtain a word information sequence to be sorted.
In another embodiment of the present application, an electronic device is provided. The device comprises a memory and a processor, wherein,
The memory is used for storing programs;
The processor, coupled to the memory, is configured to execute the program stored in the memory for:
identifying a plurality of text information to be sequenced from the image to be identified;
Determining the reading sequence of the plurality of text information to be ordered according to the corresponding characteristics of the plurality of text information to be ordered, wherein the characteristics carry semantic characteristics;
and sorting the plurality of text messages to be sorted according to the reading sequence to obtain a text message sequence to be sorted.
In another embodiment of the present application, an electronic device is provided. The device comprises a memory and a processor, wherein,
The memory is used for storing programs;
The processor, coupled to the memory, is configured to execute the program stored in the memory for:
Acquiring a plurality of text information to be sequenced;
Synthesizing the respective corresponding characteristics of the plurality of word information to be ordered and the adjacent relation among the plurality of word information to be ordered, and determining the reading sequence of the plurality of word information to be ordered;
And sequencing the plurality of first text messages according to the reading sequence to obtain a first text message sequence.
In another embodiment of the present application, an electronic device is provided. The device comprises a memory and a processor, wherein,
The memory is used for storing programs;
The processor, coupled to the memory, is configured to execute the program stored in the memory for:
identifying a plurality of text information to be sequenced from the image to be identified;
determining the character types of the plurality of character information to be ordered;
Acquiring an arrangement rule corresponding to the character type;
And sorting the plurality of word information to be sorted according to the arrangement rule to obtain a word information sequence to be sorted.
According to the technical scheme provided by the embodiment of the application, after the plurality of text information to be ordered contained in the image to be identified is identified, the text information to be ordered is read and ordered by combining the semantics corresponding to the text information to be ordered. The sequencing method provided by the embodiment of the application is suitable for images in any text typesetting format, and has wide application range and good applicability.
According to the technical scheme provided by the embodiment of the application, when the plurality of text messages to be sequenced are read and sequenced, the characteristics of each text message are considered, and the adjacent relation among the plurality of text messages to be sequenced is considered, so that the accuracy of sequencing can be effectively improved, and the semantic relevance of the finally obtained text message sequence is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1a is a diagram illustrating an example of an image recognition method according to an embodiment of the present application;
FIG. 1b is a diagram illustrating an example of an image recognition method according to another embodiment of the present application;
FIG. 1c is a flowchart illustrating an image recognition method according to an embodiment of the present application;
FIG. 2 is a flowchart of a text sorting method according to an embodiment of the present application;
FIG. 3 is a block diagram illustrating an image recognition apparatus according to an embodiment of the present application;
FIG. 4 is a block diagram of a text sorting apparatus according to another embodiment of the present application;
Fig. 5 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
At present, the existing image and text recognition products default to provide a simple reading sequence from left to right and from top to bottom, and the simple ordering scheme can fail for typesetting complicated pictures.
In the prior art, two methods exist, namely a typesetting analysis method and a structured template method.
The typesetting analysis method specifically comprises two modes of bottom-up and top-down. The bottom-up mode utilizes the visual information of the text blocks, such as the characteristics of distance, size, color and the like, to synthesize paragraphs through rules, and after synthesizing the paragraphs, the reading sequence from left to right and from top to bottom is still adopted for text ordering in the paragraphs. The top-down method is to cut the pictures according to paragraphs by directly using an image segmentation method, and after the segmentation of the paragraphs is completed, the sequence reading inside the paragraphs is carried out according to the reading sequence from left to right and from top to bottom.
The typesetting analysis method can process most of document pictures, namely pictures composed of large sheets of regular characters. However, for the complex image-text mixed arrangement, for example, the network advertisement diagram and the e-commerce description diagram have poor effects.
The structured template method outputs a text structure according to the configured template rule, can process more complex typesetting conditions, but can only be suitable for more single typesetting formats, such as invoices, certificates, bank cards and the like, and can not generate semantically related sequences for general conditions.
In order to improve applicability or universality of a reading and sorting method, the embodiment of the application provides a method for reading and sorting characters based on semantics.
In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present application with reference to the accompanying drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Furthermore, in some of the flows described in the specification, claims, and drawings above, a plurality of operations occurring in a particular order may be included, and the operations may be performed out of order or concurrently with respect to the order in which they occur. The sequence numbers of operations such as 101, 102, etc. are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.
Fig. 1c is a schematic flow chart of an image processing method according to an embodiment of the application. The execution subject of the method can be a client or a server. The client may be hardware integrated on the terminal and provided with an embedded program, or may be an application software installed in the terminal, or may be a tool software embedded in an operating system of the terminal, which is not limited in the embodiment of the present application. The terminal can be any terminal equipment including a mobile phone, a tablet personal computer, an intelligent sound box and the like. The server may be a common server, a cloud end, a virtual server, or the like, which is not particularly limited in the embodiment of the present application.
As shown in fig. 1c, the method comprises:
101. And identifying a plurality of text information to be sequenced contained in the image to be identified.
102. And determining the reading sequence of the plurality of text information to be ordered according to the corresponding characteristics of the plurality of text information to be ordered.
103. And sorting the plurality of text messages to be sorted according to the reading sequence to obtain a text message sequence to be sorted.
In the above 101, the image to be identified refers to an image containing text information, such as a network advertisement diagram, an e-commerce description diagram, an invoice diagram, a certificate diagram, a bank card diagram, and the like.
The image to be recognized may be recognized using an image character recognition algorithm, such as OCR (Optical Character Recognition ), to obtain a plurality of text information to be ordered contained therein. The specific implementation of the image text recognition algorithm can be referred to in the prior art, and will not be described herein. In general, an image to be identified is identified by using an image text identification algorithm, so that not only a plurality of text information to be ordered contained in the image to be identified can be obtained, but also the positions of the text information to be ordered in the image to be identified can be obtained.
Wherein, the word information to be ordered refers to single words or words. For example, "I" are words and "we" are words. In practical applications, the text information to be ordered is usually referred to as a single word.
In 102, the features carry semantic features. A natural language processing algorithm can be adopted to extract semantic features corresponding to the identified text information to be sequenced. The process of extracting semantic features can be found in the prior art, and will not be described in detail herein. In an example, the semantic features corresponding to each text message to be ordered may be used as the features corresponding to each text message to be ordered.
In another example, visual features may also be carried in the features. By combining the two aspects of semantic features and visual features, the accuracy of sequencing can be effectively improved. For example, the text information identified in the image to be identified comprises 'I','s' and 'you', and if the text information is only semantically arranged in front of 'I', 'you', the text information and 'you' form 'we' and 'your', 'you'; if the fonts of I and II are found to be inconsistent in combination with the visual aspect, and the fonts of I and II are consistent, the previous position of I and II can be determined.
It should be noted that the features involved in the embodiments of the present application may be in vector form.
In an implementation scheme, the reading sequence of the plurality of text information to be ordered may be determined according to the semantic features and the preset syntax corresponding to the plurality of text information to be ordered.
In 103, semantic consistency of the text information sequence to be ordered is obtained by ordering the text information to be ordered according to the reading sequence.
According to the technical scheme provided by the embodiment of the application, after the plurality of text information to be ordered contained in the image to be identified is identified, the text information to be ordered is read and ordered by combining the semantics corresponding to the text information to be ordered. The sequencing method provided by the embodiment of the application is suitable for images in any text typesetting format, and has wide application range and good applicability.
In the image to be identified, a certain association relationship, such as a semantically, visually and/or positionally association relationship, is hidden among a plurality of text information to be ordered. These relationships are very important information, and if they can be utilized, the sorting accuracy can be effectively improved. Therefore, in an example, in the above 102, "determining the reading order of the plurality of text messages to be ordered according to the features corresponding to the plurality of text messages to be ordered" may be implemented by:
1021. And determining the adjacent relation among the plurality of text messages to be ordered.
1022. And integrating the respective corresponding characteristics of the plurality of word information to be ordered and the adjacent relation among the plurality of word information to be ordered to determine the reading sequence of the plurality of word information to be ordered.
In 1021, the adjacency relationship between the plurality of text messages to be ordered can indicate the association relationship between each text message to be ordered and other text messages to be ordered.
The adjacency relationship among the plurality of text messages to be ordered can be determined by one or more of the following methods:
The method comprises the steps of searching for to-be-sequenced text information in a set range which takes the position of the to-be-sequenced text information in the to-be-sequenced image as the center in the to-be-sequenced image aiming at each to-be-sequenced text information, determining that adjacencies exist between the to-be-sequenced text information and the to-be-sequenced text information in the set range, and determining that adjacencies do not exist between the to-be-sequenced text information and the to-be-sequenced text information outside the set range.
In a first method, the adjacency is specifically an adjacency of the relevant location.
And secondly, determining the adjacent relation among the plurality of text information to be ordered according to the characteristics corresponding to the plurality of text information to be ordered.
In the second method, the method can be realized by the following steps:
s11, calculating the correlation between every two word information to be ordered according to the corresponding characteristics of every two word information to be ordered in the word information to be ordered.
S12, determining whether the two text messages to be ordered are adjacent or not according to the correlation between the two text messages to be ordered.
In the above S11, in one possible scheme, the similarity between the features corresponding to each two pieces of text information to be ordered may be directly calculated, and the similarity between the features corresponding to each two pieces of text information to be ordered is used as the correlation between each two pieces of text information to be ordered. In practical application, the features can be in vector form, and the inner product between the features corresponding to each two text messages to be ordered can be used as the similarity.
In S12, a correlation threshold may be set in advance, and if the correlation between every two text messages to be ordered is greater than or equal to the correlation threshold, the two text messages to be ordered are determined to be adjacent to each other.
The features carry semantic features, so that the adjacency relationship is specifically an adjacency relationship related to semantics.
In another example, visual features may also be carried in the features, so adjacencies are specifically semantic and visual adjacencies.
When the features further carry visual features, in order to improve accuracy of correlation calculation, in the step S11, "calculate correlation between every two to-be-sorted text information according to features corresponding to every two to-be-sorted text information in the plurality of to-be-sorted text information", the method specifically may be implemented by adopting the following steps:
A. and calculating the first similarity between semantic features corresponding to each two text messages to be sequenced.
B. and calculating the second similarity between the visual features corresponding to each two text messages to be sequenced.
C. And combining the first similarity and the second similarity to determine the correlation between every two text messages to be sequenced.
In the step a, the semantic features are specifically in a vector form, and the inner product between the semantic features can be used as the first similarity.
In the step B, the visual features are specifically in the form of vectors, and the inner product between the visual features can be used as the second similarity.
In the above step C, in an implementation manner, a sum of the first similarity and the second similarity may be used as the correlation between the two text messages to be sorted.
In another implementation scheme, the first similarity and the second similarity can be weighted and summed to obtain the correlation of the text information to be ranked every two. The weight corresponding to the first similarity and the weight of the second similarity may be set according to actual needs, for example, may be determined by combining a priori experience, which is not particularly limited in the embodiment of the present application.
In 1021, the adjacency relationship between the plurality of text messages to be ordered indicates whether each two text messages to be ordered in the plurality of text messages to be ordered are adjacent.
In the foregoing 1022, the features corresponding to the text information to be ordered only include the information of the text information itself, and do not include the relevant information around the text information (e.g., the information of other text information to be ordered adjacent to the text information). That is, the expression of the characteristics is not good enough and not comprehensive. In order to improve the feature expression, the features corresponding to the text information to be sequenced and the adjacent relation among the text information to be sequenced can be synthesized, and the features corresponding to the text information to be sequenced are updated to obtain the updated features corresponding to the text information to be sequenced. And determining the reading sequence of the plurality of text information to be ordered according to the updated characteristics corresponding to the plurality of text information to be ordered. The updated characteristics corresponding to the text information to be sequenced not only contain the information of the updated characteristics, but also contain the information of other adjacent text information to be sequenced, the characteristics are more abstract and better in expression, and the sequencing accuracy is improved.
In an example, in the foregoing 1022, "the reading order of the plurality of text information to be ordered is determined by integrating the features corresponding to each of the plurality of text information to be ordered and the adjacency relationship between the plurality of text information to be ordered", specifically, the method may be implemented by the following steps:
S21, constructing a graph structure with nodes and edges according to the adjacent relation among the plurality of text information to be ordered.
The nodes in the graph structure are used for representing text information to be sequenced, and the edges in the graph structure are used for representing whether the nodes are adjacent or not.
S22, taking the characteristics corresponding to the plurality of text information to be ordered and the graph structure as the input of a trained graph convolution neural network model, and executing the graph convolution neural network model to obtain the reading sequence of the plurality of text information to be ordered.
In S21, the nodes in the graph structure are used to represent text information to be ordered, and the edges in the graph structure are used to represent whether the nodes are adjacent to each other. An edge exists between two nodes, indicating that the two nodes are contiguous. The graph structure may be represented by an adjacency matrix. The above-described graph structure may also be referred to as a topology.
In the above step S22, the graph convolution neural network model can well extract features, and can effectively improve the ordering accuracy.
The graph convolution neural network model is specifically used for:
S31, according to the characteristics corresponding to the plurality of text information to be ordered and the graph structure, updated characteristics corresponding to the plurality of text information to be ordered are obtained through graph convolution operation.
S32, determining the reading sequence of the plurality of text information to be ordered according to the updated characteristics corresponding to the plurality of text information to be ordered.
In S31, the graph structure information may be embedded into the features corresponding to the plurality of text information to be ordered through the graph rolling operation, so as to obtain updated features corresponding to the plurality of text information to be ordered. To obtain higher dimensional features, multiple graph convolution operations may be performed to obtain features.
Specifically, a feature extraction sub-network may be included in the convolutional neural network model, and may include a plurality of convolutional network layers, each of which performs a convolutional operation. The method comprises the steps of selecting a plurality of adjacent matrixes representing graph structures as input of a first one of a plurality of scroll network layers, selecting characteristics corresponding to each of the plurality of character information to be ordered, which are output by a previous one of the plurality of scroll network layers, as input of a subsequent one of the plurality of scroll network layers, and selecting characteristics corresponding to each of the plurality of character information to be ordered, which are output by a last one of the plurality of scroll network layers, as updated characteristics corresponding to each of the plurality of character information to be ordered. It should be noted that, the characteristics corresponding to the plurality of text information to be ordered output by each graph convolution network layer are different from the characteristics corresponding to the plurality of text information to be ordered input by the graph convolution network layer, and the characteristics are more abstract and have higher dimension.
It should be noted that each of the convolution network layers includes a trained feature extraction parameter matrix, and each of the convolution network layers is executed in combination with the feature extraction parameter matrix during the convolution operation. The specific implementation of the graph convolution operation may be designed according to actual needs, and this embodiment is not limited in particular.
In one implementation, the plurality of text information to be ordered includes a first text information to be ordered, where the first text information to be ordered refers to any one of the plurality of text information to be ordered. Each picture convolution network layer is specifically used for determining at least one second text message to be ordered adjacent to the first text message to be ordered according to a first text message to be ordered and an adjacent matrix, respectively splicing the characteristics corresponding to the first text message to be ordered and the characteristics corresponding to the at least one second text message to be ordered respectively to obtain at least one first splicing characteristic, combining the at least one first splicing characteristic into a splicing characteristic matrix, carrying out matrix multiplication on the splicing characteristic matrix and a trained characteristic extraction parameter matrix to obtain a first matrix, and carrying out pooling treatment on the first matrix to obtain the characteristics corresponding to the first text message to be ordered. The pooling treatment can be specifically average pooling or maximum pooling.
For example, the input character information to be ordered is characterized by an h-dimensional vector, the character information to be ordered is characterized by a j-dimensional vector, and the first splicing feature is a (h+j) -dimensional vector.
In an implementation scheme, in the step S32, "determining the reading order of the plurality of text information to be ordered according to the updated features corresponding to the plurality of text information to be ordered", the method specifically may be implemented by the following steps:
S321, integrating updated characteristics corresponding to the text information to be sequenced respectively, and calculating global text information characteristics serving as initial reference characteristics.
S322, calculating the attention weight corresponding to the at least one text message to be ordered which is not yet output in the plurality of text messages to be ordered according to the reference feature and the updated feature corresponding to the at least one text message to be ordered which is not yet output in the plurality of text messages to be ordered.
S323, outputting the text information to be ordered corresponding to the maximum attention weight, taking the updated feature corresponding to the text information to be ordered corresponding to the maximum attention weight as a new reference feature, and continuing to execute the attention weight calculation step until all the text information to be ordered are output.
S324, determining the output sequence of the plurality of text messages to be ordered as the reading sequence of the plurality of text messages to be ordered.
In practical applications, the graph roll-up neural network model may also include an attention sub-network. Steps S321 and S322 described above are performed by the attention sub-network.
In S321, in an example, the updated features corresponding to the text information to be sorted may be pooled to obtain global text information features. The pooling treatment can be, in particular, average pooling or maximum pooling. The global character information features are integrated with the features of a plurality of character information to be sequenced.
And determining the global character information features, so that the first character information sequenced in the first character information sequence can be conveniently found.
In another example, the graph rolling operation can be further utilized to further update the updated features corresponding to the plurality of to-be-processed ordering text information to obtain further updated features, and then the further updated features corresponding to the plurality of to-be-ordered text information are subjected to pooling processing to obtain global text information features. I.e. the attention sub-network comprises a graph roll-up network layer, which may in particular be a fully connected network layer, and a pooling layer.
In S322, at least one text message to be ordered, which is not yet outputted in the plurality of text messages to be ordered, includes a third text message to be ordered. The third to-be-ordered text information refers to any one of the at least one to-be-ordered text information. Taking the characteristic as a vector form as an example, the updated characteristic corresponding to the reference characteristic and the third text information to be sequenced can be spliced to obtain a second spliced characteristic, and the second spliced characteristic and the attention parameter vector are subjected to dot multiplication to obtain the attention weight corresponding to the third text information to be sequenced.
For example, the reference feature is an n-dimensional vector, the updated feature corresponding to the third text information to be ordered is an m-dimensional vector, and the second stitching feature is an (n+m) -dimensional vector.
In S323, the text information to be sorted corresponding to the maximum attention weight is output, and the updated feature corresponding to the text information to be sorted corresponding to the maximum attention weight is used as the new reference feature.
If the plurality of text information to be ordered is not all output, the attention weight of the text information to be ordered which is not output currently is calculated continuously based on the new reference characteristics.
And stopping the attention weight calculation step if the plurality of text information to be sequenced are all output.
In S324, the output sequence of the plurality of text messages to be ordered is the reading sequence of the plurality of text messages to be ordered.
In the above embodiment, the text information to be ordered corresponding to the maximum attention weight defaults to one, and when there are a plurality of text information to be ordered, a plurality of reading orders will appear at this time. The graph convolution neural network model can determine a plurality of reading sequences of the text information to be sequenced. The text information to be sequenced can be sequenced according to various reading sequences to obtain a plurality of text information sequences to be sequenced. The method can further comprise the step of displaying the plurality of text information sequences to be ordered on a user interface for selection by a user. In addition, the model can be optimized according to the target text information sequence to be ordered selected by the user. Specifically, the model can be subjected to model training once by combining the image to be identified and the target text information sequence to be ordered, so that the optimization of the model is realized.
The training process of the graph roll neural network model is as follows:
104. And acquiring a sample image and expected text information sequences corresponding to a plurality of sample text information contained in the sample image.
105. And optimizing the graph convolution neural network model according to the sample characteristics corresponding to the sample text information, the graph structure corresponding to the sample text information and the sample text information sequence.
In 105, the sample characteristics corresponding to the plurality of sample text information and the graph structure corresponding to the plurality of sample text information may be input into the graph convolutional neural network model, a predicted reading sequence corresponding to the plurality of sample text information may be determined, the plurality of sample text information may be ordered according to the predicted reading sequence to obtain a predicted text information sequence, and the graph convolutional neural network model may be parameter optimized according to a difference between the predicted text information sequence and the expected text information sequence. The specific parameter optimization process can be found in the prior art, and is not described in detail herein.
In practical application, the method may further include:
106. and extracting semantic features corresponding to the identified text information to be sequenced.
107. And extracting visual features corresponding to the text information to be sequenced respectively.
108. And merging semantic features and visual features corresponding to the plurality of text information to be sequenced to obtain the features corresponding to the plurality of text information to be sequenced.
In 106, a natural language processing algorithm may be used to extract semantic features corresponding to the identified text information to be ranked.
In one possible implementation manner, the "extracting the visual features corresponding to each of the plurality of text information to be ordered" in the above 107 may be implemented by:
1071. And determining the sub-image area where the plurality of text information to be ordered are respectively located from the image to be identified according to the positions of the plurality of text information to be ordered in the image to be identified.
1072. And extracting visual features corresponding to the plurality of text information to be sequenced respectively according to the sub-image areas where the plurality of text information to be sequenced are respectively.
In 1071, the image text information recognition technology is used to recognize a plurality of text information to be sequenced and the positions of each text information to be sequenced in the image to be processed.
In an example, the sub-image area in which the text information to be ordered is located may specifically be a compact rectangular frame area surrounding the text information to be ordered.
In 1072, the visual characteristics may include information such as fonts, font colors, and background textures.
In practice, some conventional feature extraction algorithms, such as SIFT-INVARIANT FEATURE TRANSFORM (Scale-invariant feature transform), may be used to extract visual features.
The visual features extracted by the traditional feature extraction algorithm are low-dimensional information, but not high-dimensional information, namely, the feature expression is poor. In order to improve the expressive performance of the visual features, in an example, the trained convolutional neural network may be used to extract the visual features, for example, the sub-image regions where the text information to be sequenced are respectively located may be respectively input into the trained convolutional neural network, so as to obtain the visual features corresponding to the text information to be sequenced. The specific implementation and training process of the convolutional neural network can be referred to in the prior art, and will not be described herein.
In 108, the plurality of text information to be ordered includes the first text information to be ordered, and the semantic features and the visual features corresponding to the first text information to be ordered may be spliced to obtain the features corresponding to the first text information to be ordered.
In practical application, a plurality of text areas are usually arranged in the image to be identified, the text areas are far apart, at the moment, the text areas of the image to be identified can be divided, and then the text in each text area is ordered, so that the difficulty of subsequent ordering can be reduced, and the ordering accuracy can be improved. Thus, in one example, the above method may further comprise:
109. and identifying a plurality of text information contained in the image to be identified and the position of each text information in the image to be identified.
110. And dividing the plurality of text information by using a clustering algorithm according to the position of each text information in the image to be identified to obtain at least one text information cluster.
111. And selecting a plurality of text information in one text information cluster from the at least one text information cluster as the text information to be sequenced.
In the above 109, the OCR algorithm may be specifically implemented, and the specific implementation may be referred to the corresponding content in each embodiment, which is not described herein.
In 110, a hierarchical clustering algorithm may be specifically adopted as the clustering algorithm. The distance between the text information in the same text information cluster is smaller than the distance between the text information in different text information clusters. The specific implementation process of the clustering algorithm can be referred to the prior art, and will not be described herein.
In the above 111, a plurality of text information in one text information cluster is selected from the at least one text information cluster as the plurality of text information to be ordered.
In practical application, the method can sort the text information in each text information cluster.
An example of an image recognition method according to an embodiment of the present application will be described with reference to fig. 1 a:
and 1, performing character recognition in the image to be recognized, and recognizing that the image to be recognized contains three characters of 'year', 'section' and 'goods'.
And 2, respectively extracting semantic features corresponding to the three characters of the year, the festival and the goods by using a natural language processing algorithm.
And 3, sequencing the plurality of characters according to the semantic features corresponding to the characters to obtain a character sequence 'annual goods knot'.
And 4, outputting the text sequence to an interface for display.
An image recognition method according to a further embodiment of the present application will be described by way of example with reference to fig. 1 b:
And a, performing character recognition in the image to be recognized, and recognizing that the image to be recognized contains three characters of 'year', 'section' and 'goods' and the positions of the three characters in the image to be recognized. And taking out the subgraph (namely the subgraph area) where each text is located according to the position of each text in the image to be identified.
And b, respectively extracting semantic features corresponding to the three characters of 'year', 'section' and 'goods' by using a natural language processing algorithm, and respectively extracting visual features of subgraphs where the characters are located by using a convolutional neural network CNN to obtain the visual features corresponding to the characters.
And c, calculating the correlation between any two characters according to the similarity of the visual features and the similarity of the semantic features, and constructing a graph structure based on the correlation.
And d, splicing the semantic features and the visual features corresponding to the characters to obtain the features corresponding to the characters, inputting the graph structure and the features corresponding to the characters into the trained graph convolution neural network model, and executing the graph convolution neural network model to obtain the reading sequence of the three characters.
And e, sequencing the three characters according to the reading sequence of the three characters to obtain a character sequence 'annual goods festival', and outputting the character sequence to an interface for display.
The method for the proposal does not depend on templates, and can generate a certain sequence under any typesetting condition, thereby having better application range. The evaluation indexes on the common horizontal text and the column typesetting text in the test set are higher than 80%.
Yet another embodiment of the present application provides an image recognition method, including:
501. and identifying a plurality of text information to be sequenced contained in the image to be identified.
502. And determining the character types of the plurality of character information to be ordered.
503. And obtaining an arrangement rule corresponding to the text type.
504. And sorting the plurality of word information to be sorted according to the arrangement rule to obtain a word information sequence to be sorted.
The specific implementation of 501 may be referred to the corresponding content in the above embodiments, and will not be described herein.
In 502, the arrangement rules corresponding to different text types are generally different. For example, the palindromic text in the image to be processed is typically arranged in a top-to-bottom and right-to-left order, and the modern text in the image to be processed is typically arranged in a left-to-right and top-to-bottom order.
In an example, the text types may include ancient text types and modern text types.
In 503, the arrangement rule corresponding to the text type may be obtained according to the corresponding relationship between the text type and the arrangement rule established in advance. The arrangement rules from top to bottom and from right to left can be configured in advance for ancient text types, and from left to right and from top to bottom for modern text types.
In the above 504, the plurality of text information to be ordered may be ordered according to the arrangement rule and the positions of the plurality of text information to be ordered in the image to be processed, so as to obtain a text information sequence to be ordered.
In this embodiment, the text information is ordered according to different arrangement rules for different text types, so that the applicability and accuracy of the ordering scheme can be effectively improved.
It should be noted that, in the method provided in the embodiment of the present application, details of each step may be referred to corresponding details in the above embodiment, which are not described herein. In addition, the method provided in the embodiment of the present application may further include other part or all of the steps in the above embodiments, and specific reference may be made to the corresponding content of the above embodiments, which is not repeated herein.
Fig. 2 is a flow chart illustrating a text sorting method according to another embodiment of the present application. The execution subject of the method can be a client or a server. The client may be hardware integrated on the terminal and provided with an embedded program, or may be an application software installed in the terminal, or may be a tool software embedded in an operating system of the terminal, which is not limited in the embodiment of the present application. The terminal can be any terminal equipment including a mobile phone, a tablet personal computer, an intelligent sound box and the like. The server may be a common server, a cloud end, a virtual server, or the like, which is not particularly limited in the embodiment of the present application.
As shown in fig. 2, the method includes:
201. and acquiring a plurality of text information to be sequenced.
202. And integrating the respective corresponding characteristics of the plurality of word information to be ordered and the adjacent relation among the plurality of word information to be ordered to determine the reading sequence of the plurality of word information to be ordered.
203. And sequencing the plurality of first text messages according to the reading sequence to obtain a first text message sequence.
The above 201 may be a plurality of text information to be ordered, which may be identified from images to be identified, or may be input by a user.
For example, a word ordering function can be embedded in the pupil's home teaching machine, and when the pupil encounters a problem of ' connecting words to sentence ', the pupil can input a plurality of words in the questions, namely a plurality of word information to be ordered, in the home teaching machine.
In 202, the adjacencies between the text information to be ordered may be semantic adjacencies, or may be adjacencies in other aspects, which is not limited in particular in the embodiments of the present application.
The specific implementation process of the foregoing 202 and 203 may be referred to the corresponding content in the foregoing embodiments, which is not repeated herein.
According to the technical scheme provided by the embodiment of the application, when the plurality of text messages to be sequenced are read and sequenced, the characteristics of each text message are considered, and the adjacent relation among the plurality of text messages to be sequenced is considered, so that the accuracy of sequencing can be effectively improved, and the semantic relevance of the finally obtained text message sequence is improved.
Optionally, the method may further include:
204. and determining the adjacent relation among the plurality of first text messages according to the corresponding characteristics of the plurality of first text messages.
The specific implementation of 204 may be referred to the corresponding content in the above embodiments, and will not be described herein.
Optionally, in the foregoing 202, "the reading sequence of the plurality of text information to be ordered is determined by integrating the features corresponding to each of the plurality of text information to be ordered and the adjacency relations between the plurality of text information to be ordered", which may be implemented specifically by the following steps:
2021. And constructing a graph structure with nodes and edges according to the adjacent relation among the plurality of text information to be ordered.
The nodes in the graph structure are used for representing text information to be sequenced, and the edges in the graph structure are used for representing whether the nodes are adjacent or not.
2022. And taking the characteristics corresponding to the plurality of text information to be ordered and the graph structure as the input of a trained graph convolution neural network model, and executing the graph convolution neural network model to obtain the reading sequence of the plurality of text information to be ordered.
The specific implementation process of 2021 and 2022 may be referred to the corresponding content in each embodiment, which is not described herein.
It should be noted that, in the method provided in the embodiment of the present application, details of each step may be referred to corresponding details in the above embodiment, which are not described herein. In addition, the method provided in the embodiment of the present application may further include other part or all of the steps in the above embodiments, and specific reference may be made to the corresponding content of the above embodiments, which is not repeated herein.
Fig. 3 is a block diagram showing an image recognition apparatus according to an embodiment of the present application. As shown in fig. 3, the apparatus includes:
The first identifying module 301 is configured to identify, from the image to be identified, a plurality of text information to be ordered contained therein.
The first determining module 302 is configured to determine a reading order of the plurality of text messages to be ordered according to the features corresponding to the plurality of text messages to be ordered.
Wherein the features carry semantic features.
The first sorting module 303 is configured to sort the plurality of text messages to be sorted according to the reading order, so as to obtain a text message sequence to be sorted.
Optionally, the apparatus may further include:
The first acquisition module is used for acquiring a sample image and expected text information sequences corresponding to a plurality of sample text information contained in the sample image;
And the first optimizing module is used for optimizing the graph convolution neural network model according to the sample characteristics corresponding to the sample text information, the graph structure corresponding to the sample text information and the sample text information sequence.
Optionally, the apparatus may further include:
The first extraction module is used for extracting semantic features corresponding to the identified text information to be sequenced respectively and extracting visual features corresponding to the text information to be sequenced respectively;
And the first fusion module is used for fusing the semantic features and the visual features corresponding to the plurality of text information to be sequenced to obtain the features corresponding to the plurality of text information to be sequenced.
It should be noted that, the image recognition device provided in the foregoing embodiments may implement the technical solutions described in the foregoing method embodiments, and the specific implementation principles of the foregoing modules or units may be referred to the corresponding content in the foregoing method embodiments, which is not repeated herein.
Fig. 4 is a block diagram of a text sorting apparatus according to an embodiment of the present application. As shown in fig. 4, the apparatus includes:
A second obtaining module 401, configured to obtain a plurality of text information to be ordered;
A second determining module 402, configured to synthesize features corresponding to each of the plurality of text information to be ordered and an adjacency relationship between the plurality of text information to be ordered, and determine a reading sequence of the plurality of text information to be ordered;
and the second sorting module 403 is configured to sort the plurality of first text messages according to the reading order, so as to obtain a first text message sequence.
Optionally, the apparatus may further include:
And the third determining module is used for determining the adjacent relation among the plurality of first text messages according to the characteristics corresponding to the plurality of first text messages.
It should be noted that, the text sorting device provided in the foregoing embodiments may implement the technical solutions described in the foregoing method embodiments, and the specific implementation principles of the foregoing modules or units may refer to corresponding contents in the foregoing method embodiments, which are not repeated herein.
Still another embodiment of the present application provides an image recognition apparatus including:
the second recognition module is used for recognizing a plurality of text information to be sequenced from the images to be recognized.
And the fourth determining module is used for determining the text types of the plurality of text information to be sequenced.
And the third acquisition module is used for acquiring the arrangement rule corresponding to the text type.
And the third ordering module is used for ordering the plurality of word information to be ordered according to the ordering rule to obtain a word information sequence to be ordered.
It should be noted that, the image recognition device provided in the foregoing embodiments may implement the technical solutions described in the foregoing method embodiments, and the specific implementation principles of the foregoing modules or units may be referred to the corresponding content in the foregoing method embodiments, which is not repeated herein.
Fig. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown, the electronic device includes a memory 1101 and a processor 1102. The memory 1101 may be configured to store various other data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on an electronic device. The memory 1101 may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The memory is used for storing programs;
the processor 1102 is coupled to the memory 1101, and is configured to execute the program stored in the memory 1101, so as to implement the image recognition method or the text sorting method in the above embodiments.
Further, as shown in FIG. 5, the electronic device also includes a communication component 1103, a display 1104, a power supply component 1105, an audio component 1106, and other components. Only some of the components are schematically shown in fig. 5, which does not mean that the electronic device only comprises the components shown in fig. 5.
Accordingly, the embodiments of the present application also provide a computer-readable storage medium storing a computer program, where the computer program when executed by a computer can implement the steps or functions of the image recognition method and the text sorting method provided in the foregoing embodiments.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same, and although the present application has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present application.

Claims (16)

1.一种图像识别方法,其特征在于,包括:1. An image recognition method, comprising: 从待识别图像中,识别出其中所包含的多个待排序文字信息;Identify multiple text information to be sorted contained in the image to be identified; 将所述多个待排序文字信息各自对应的特征以及图结构作为训练过的图卷积神经网络模型的输入,执行所述图卷积神经网络模型,以获得所述多个待排序文字信息的阅读顺序;其中,所述特征中携带有语义特征;所述图结构是根据所述多个待排序文字信息间的邻接关系构建的,所述图结构中的节点用来表示待排序文字信息;所述图结构中的边用来表示节点间是否邻接;The features and graph structures corresponding to the plurality of text information to be sorted are used as inputs of a trained graph convolutional neural network model, and the graph convolutional neural network model is executed to obtain a reading order of the plurality of text information to be sorted; wherein the features carry semantic features; the graph structure is constructed according to the adjacency relationship between the plurality of text information to be sorted, and the nodes in the graph structure are used to represent the text information to be sorted; and the edges in the graph structure are used to represent whether the nodes are adjacent; 按照所述阅读顺序,对所述多个待排序文字信息进行排序,得到待排序文字信息序列。The plurality of text information to be sorted are sorted according to the reading order to obtain a sequence of text information to be sorted. 2.根据权利要求1所述的方法,其特征在于,还包括:2. The method according to claim 1, further comprising: 确定所述多个待排序文字信息间的邻接关系。Determine the adjacency relationship between the multiple text information to be sorted. 3.根据权利要求2所述的方法,其特征在于,还包括:3. The method according to claim 2, further comprising: 根据所述多个待排序文字信息间的邻接关系,构建具有节点和边的图结构。According to the adjacency relationship between the plurality of text information to be sorted, a graph structure having nodes and edges is constructed. 4.根据权利要求3所述的方法,其特征在于,所述图卷积神经网络模型,用于:4. The method according to claim 3, characterized in that the graph convolutional neural network model is used to: 根据所述多个待排序文字信息各自对应的特征以及所述图结构,通过图卷积操作,得到所述多个待排序文字信息各自对应的更新后特征;According to the features corresponding to the plurality of text information to be sorted and the graph structure, obtaining updated features corresponding to the plurality of text information to be sorted through a graph convolution operation; 根据所述多个待排序文字信息各自对应的更新后特征,确定所述多个待排序文字信息的阅读顺序。The reading order of the plurality of text information to be sorted is determined according to the updated features corresponding to each of the plurality of text information to be sorted. 5.根据权利要求4所述的方法,其特征在于,根据所述多个待排序文字信息各自对应的更新后特征,确定所述多个待排序文字信息的阅读顺序,包括:5. The method according to claim 4, characterized in that determining the reading order of the plurality of text information to be sorted according to the updated features corresponding to each of the plurality of text information to be sorted comprises: 综合所述多个待排序文字信息各自对应的更新后特征,计算全局文字信息特征,以作为起始的参考特征;The updated features corresponding to the plurality of text information to be sorted are integrated to calculate the global text information features as the starting reference features; 根据所述参考特征以及所述多个待排序文字信息中尚未输出的至少一个待排序文字信息各自对应的更新后特征,计算所述多个待排序文字信息中尚未输出的至少一个待排序文字信息各自对应的注意力权重;Calculate the attention weight corresponding to each of the at least one text information to be sorted that has not been output among the multiple text information to be sorted according to the reference feature and the updated feature corresponding to each of the at least one text information to be sorted that has not been output among the multiple text information to be sorted; 输出最大注意力权重对应的待排序文字信息,并将所述最大注意力权重对应的待排序文字信息对应的更新后特征作为新的参考特征,继续执行上述注意力权重计算步骤,直至所述多个待排序文字信息全部被输出为止;Output the text information to be sorted corresponding to the maximum attention weight, and use the updated features corresponding to the text information to be sorted corresponding to the maximum attention weight as new reference features, and continue to perform the above attention weight calculation step until all the multiple text information to be sorted are output; 将所述多个待排序文字信息的输出顺序,确定为所述多个待排序文字信息的阅读顺序。The output order of the plurality of text information to be sorted is determined as the reading order of the plurality of text information to be sorted. 6.根据权利要求3至5任一项所述的方法,其特征在于,还包括:6. The method according to any one of claims 3 to 5, further comprising: 获取样本图像以及所述样本图像中所包含的多个样本文字信息对应的期望文字信息序列;Acquire a sample image and an expected text information sequence corresponding to a plurality of sample text information contained in the sample image; 根据所述多个样本文字信息各自对应的样本特征、所述多个样本文字信息对应的图结构以及所述期望文字信息序列,对所述图卷积神经网络模型进行优化。The graph convolutional neural network model is optimized according to the sample features corresponding to each of the multiple sample text information, the graph structures corresponding to the multiple sample text information, and the expected text information sequence. 7.根据权利要求2至5中任一项所述的方法,其特征在于,确定所述多个待排序文字信息间的邻接关系,包括:7. The method according to any one of claims 2 to 5, characterized in that determining the adjacency relationship between the plurality of text information to be sorted comprises: 根据所述多个待排序文字信息各自对应的特征,确定所述多个待排序文字信息间的邻接关系。The adjacency relationship between the multiple text information to be sorted is determined according to the features corresponding to each of the multiple text information to be sorted. 8.根据权利要求7所述的方法,其特征在于,根据所述多个待排序文字信息各自对应的特征,确定所述多个待排序文字信息间的邻接关系,包括:8. The method according to claim 7, characterized in that determining the adjacency relationship between the plurality of text information to be sorted according to the features corresponding to each of the plurality of text information to be sorted comprises: 根据所述多个待排序文字信息中每两个待排序文字信息对应的特征,计算所述每两个待排序文字信息间的相关性;Calculating the correlation between each two pieces of text information to be sorted according to the features corresponding to each two pieces of text information to be sorted in the plurality of pieces of text information to be sorted; 根据所述每两个待排序文字信息间的相关性,确定所述每两个待排序文字信息间是否邻接。Whether the two pieces of text information to be sorted are adjacent is determined according to the correlation between the two pieces of text information to be sorted. 9.根据权利要求8所述的方法,其特征在于,所述特征中还携带有视觉特征;9. The method according to claim 8, characterized in that the features also carry visual features; 根据所述多个待排序文字信息中每两个待排序文字信息对应的特征,计算所述每两个待排序文字信息间的相关性,包括:Calculating the correlation between each two pieces of text information to be sorted according to the features corresponding to each two pieces of text information to be sorted among the plurality of pieces of text information to be sorted includes: 计算所述每两个待排序文字信息对应的语义特征间的第一相似度;Calculating the first similarity between the semantic features corresponding to each two pieces of text information to be sorted; 计算所述每两个待排序文字信息对应的视觉特征间的第二相似度;Calculating the second similarity between the visual features corresponding to each two pieces of text information to be sorted; 综合所述第一相似度和所述第二相似度,确定所述每两个待排序文字信息间的相关性。The first similarity and the second similarity are combined to determine the correlation between each two pieces of text information to be sorted. 10.根据权利要求1至5中任一项所述的方法,其特征在于,还包括:10. The method according to any one of claims 1 to 5, further comprising: 提取识别出的所述多个待排序文字信息各自对应的语义特征;Extracting semantic features corresponding to each of the identified plurality of text information to be sorted; 提取所述多个待排序文字信息各自对应的视觉特征;Extracting visual features corresponding to each of the plurality of text information to be sorted; 融合所述多个待排序文字信息各自对应的语义特征以及视觉特征,得到所述多个待排序文字信息各自对应的特征。The semantic features and visual features corresponding to the plurality of text information to be sorted are integrated to obtain the features corresponding to the plurality of text information to be sorted. 11.根据权利要求10所述的方法,其特征在于,提取所述多个待排序文字信息各自对应的视觉特征,包括:11. The method according to claim 10, characterized in that extracting the visual features corresponding to each of the plurality of text information to be sorted comprises: 根据所述多个待排序文字信息各自在所述待识别图像中的位置,从所述待识别图像中确定出所述多个待排序文字信息各自所在子图像区域;According to the positions of the plurality of text information to be sorted in the image to be identified, determining the sub-image regions where the plurality of text information to be sorted are located from the image to be identified; 分别根据所述多个待排序文字信息各自所在子图像区域,提取所述多个待排序文字信息各自对应的视觉特征。Visual features corresponding to the plurality of text information to be sorted are extracted respectively according to the sub-image regions where the plurality of text information to be sorted are respectively located. 12.根据权利要求1至5中任一项所述的方法,其特征在于,从待识别图像中,识别出其中所包含的多个待排序文字信息,包括:12. The method according to any one of claims 1 to 5, characterized in that identifying a plurality of text information to be sorted contained in the image to be identified comprises: 从待识别图像中,识别出其中所包含的多个文字信息以及各文字信息在所述待识别图像中的位置;Identify, from the image to be identified, a plurality of text information contained therein and the position of each text information in the image to be identified; 根据各文字信息在所述待识别图像中的位置,利用聚类算法对所述多个文字信息进行划分,得到至少一个文字信息簇;According to the position of each piece of text information in the image to be recognized, the plurality of text information are divided by using a clustering algorithm to obtain at least one text information cluster; 从所述至少一个文字信息簇中选取其中一个文字信息簇中的多个文字信息作为所述多个待排序文字信息。A plurality of text information in one of the at least one text information cluster is selected as the plurality of text information to be sorted. 13.一种文字排序方法,其特征在于,包括:13. A method for sorting characters, comprising: 获取多个待排序文字信息;Get multiple text information to be sorted; 综合所述多个待排序文字信息各自对应的特征以及所述多个待排序文字信息间的邻接关系,确定所述多个待排序文字信息的阅读顺序,包括:根据所述多个待排序文字信息间的邻接关系,构建具有节点和边的图结构;所述图结构中的节点用来表示待排序文字信息;所述图结构中的边用来表示节点间是否邻接;将所述多个待排序文字信息各自对应的特征以及所述图结构作为训练好的图卷积神经网络模型的输入,执行所述图卷积神经网络模型,以获得所述多个待排序文字信息的阅读顺序;Determining the reading order of the plurality of text information to be sorted by comprehensively considering the features corresponding to each of the plurality of text information to be sorted and the adjacency relationship between the plurality of text information to be sorted, including: constructing a graph structure with nodes and edges according to the adjacency relationship between the plurality of text information to be sorted; the nodes in the graph structure are used to represent the text information to be sorted; the edges in the graph structure are used to indicate whether the nodes are adjacent; using the features corresponding to each of the plurality of text information to be sorted and the graph structure as inputs of a trained graph convolutional neural network model, executing the graph convolutional neural network model, so as to obtain the reading order of the plurality of text information to be sorted; 按照所述阅读顺序,对所述多个待排序文字信息进行排序,得到第一文字信息序列。The plurality of text information to be sorted are sorted according to the reading order to obtain a first text information sequence. 14.根据权利要求13所述的方法,其特征在于,还包括:14. The method according to claim 13, further comprising: 根据所述多个待排序文字信息各自对应的特征,确定所述多个待排序文字信息间的邻接关系。The adjacency relationship between the multiple text information to be sorted is determined according to the features corresponding to each of the multiple text information to be sorted. 15.一种电子设备,其特征在于,包括:存储器和处理器,其中,15. An electronic device, comprising: a memory and a processor, wherein: 所述存储器,用于存储程序;The memory is used to store programs; 所述处理器,与所述存储器耦合,用于执行所述存储器中存储的所述程序,以用于:The processor is coupled to the memory and is configured to execute the program stored in the memory to: 从待识别图像中,识别出其中所包含的多个待排序文字信息;Identify multiple text information to be sorted contained in the image to be identified; 将所述多个待排序文字信息各自对应的特征以及图结构作为训练过的图卷积神经网络模型的输入,执行所述图卷积神经网络模型,以获得所述多个待排序文字信息的阅读顺序;其中,所述特征中携带有语义特征;所述图结构是根据所述多个待排序文字信息间的邻接关系构建的,所述图结构中的节点用来表示待排序文字信息;所述图结构中的边用来表示节点间是否邻接;The features and graph structures corresponding to the plurality of text information to be sorted are used as inputs of a trained graph convolutional neural network model, and the graph convolutional neural network model is executed to obtain a reading order of the plurality of text information to be sorted; wherein the features carry semantic features; the graph structure is constructed according to the adjacency relationship between the plurality of text information to be sorted, and the nodes in the graph structure are used to represent the text information to be sorted; and the edges in the graph structure are used to represent whether the nodes are adjacent; 按照所述阅读顺序,对所述多个待排序文字信息进行排序,得到待排序文字信息序列。The plurality of text information to be sorted are sorted according to the reading order to obtain a sequence of text information to be sorted. 16.一种电子设备,其特征在于,包括:存储器和处理器,其中,16. An electronic device, comprising: a memory and a processor, wherein: 所述存储器,用于存储程序;The memory is used to store programs; 所述处理器,与所述存储器耦合,用于执行所述存储器中存储的所述程序,以用于:The processor is coupled to the memory and is configured to execute the program stored in the memory to: 获取多个待排序文字信息;Get multiple text information to be sorted; 综合所述多个待排序文字信息各自对应的特征以及所述多个待排序文字信息间的邻接关系,确定所述多个待排序文字信息的阅读顺序,包括:根据所述多个待排序文字信息间的邻接关系,构建具有节点和边的图结构;所述图结构中的节点用来表示待排序文字信息;所述图结构中的边用来表示节点间是否邻接;将所述多个待排序文字信息各自对应的特征以及所述图结构作为训练好的图卷积神经网络模型的输入,执行所述图卷积神经网络模型,以获得所述多个待排序文字信息的阅读顺序;Determining the reading order of the plurality of text information to be sorted by comprehensively considering the features corresponding to each of the plurality of text information to be sorted and the adjacency relationship between the plurality of text information to be sorted, including: constructing a graph structure with nodes and edges according to the adjacency relationship between the plurality of text information to be sorted; the nodes in the graph structure are used to represent the text information to be sorted; the edges in the graph structure are used to indicate whether the nodes are adjacent; using the features corresponding to each of the plurality of text information to be sorted and the graph structure as inputs of a trained graph convolutional neural network model, executing the graph convolutional neural network model, so as to obtain the reading order of the plurality of text information to be sorted; 按照所述阅读顺序,对所述多个待排序文字信息进行排序,得到第一文字信息序列。The plurality of text information to be sorted are sorted according to the reading order to obtain a first text information sequence.
CN202010106180.7A 2020-02-20 2020-02-20 Image recognition, text sorting method and device Active CN113283432B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010106180.7A CN113283432B (en) 2020-02-20 2020-02-20 Image recognition, text sorting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010106180.7A CN113283432B (en) 2020-02-20 2020-02-20 Image recognition, text sorting method and device

Publications (2)

Publication Number Publication Date
CN113283432A CN113283432A (en) 2021-08-20
CN113283432B true CN113283432B (en) 2025-04-04

Family

ID=77275325

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010106180.7A Active CN113283432B (en) 2020-02-20 2020-02-20 Image recognition, text sorting method and device

Country Status (1)

Country Link
CN (1) CN113283432B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114239598B (en) * 2021-12-17 2024-12-03 上海高德威智能交通系统有限公司 Method, device, electronic device and storage medium for determining text element reading order
CN114495147B (en) * 2022-01-25 2023-05-05 北京百度网讯科技有限公司 Identification method, device, equipment and storage medium
CN116030468A (en) * 2022-12-27 2023-04-28 科大讯飞股份有限公司 Method and device for determining reading order, electronic device and storage medium
CN116071740B (en) * 2023-03-06 2023-07-04 深圳前海环融联易信息科技服务有限公司 Invoice identification method, computer equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077270A (en) * 2013-03-29 2014-10-01 富士胶片株式会社 Electronic book production apparatus, electronic book system and electronic book production method

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100568221C (en) * 2004-11-22 2009-12-09 北京北大方正技术研究院有限公司 A Method for Restoring the Reading Order of the Newspaper Layout
US7639881B2 (en) * 2005-06-13 2009-12-29 Microsoft Corporation Application of grammatical parsing to visual recognition tasks
CN101206639B (en) * 2007-12-20 2012-05-23 北大方正集团有限公司 An Indexing Method for Complicated Layout Based on PDF
US9811727B2 (en) * 2008-05-30 2017-11-07 Adobe Systems Incorporated Extracting reading order text and semantic entities
CN101866418B (en) * 2009-04-17 2013-02-27 株式会社理光 Method and device for determining document reading order
CN102280104B (en) * 2010-06-11 2013-05-01 北大方正集团有限公司 File phoneticization processing method and system based on intelligent indexing
JP5699570B2 (en) * 2010-11-30 2015-04-15 富士ゼロックス株式会社 Image processing apparatus and image processing program
CN103729638B (en) * 2012-10-12 2016-12-21 阿里巴巴集团控股有限公司 A kind of literal line arrangement analysis method and apparatus in character area identification
JP6204076B2 (en) * 2013-06-10 2017-09-27 エヌ・ティ・ティ・コミュニケーションズ株式会社 Text area reading order determination apparatus, text area reading order determination method, and text area reading order determination program
CN104318340B (en) * 2014-09-25 2017-07-07 中国科学院软件研究所 Information visualization methods and intelligent visible analysis system based on text resume information
CN106485186B (en) * 2015-08-26 2020-02-18 阿里巴巴集团控股有限公司 Image feature extraction method and device, terminal equipment and system
US20170083196A1 (en) * 2015-09-23 2017-03-23 Google Inc. Computer-Aided Navigation of Digital Graphic Novels
CN105574524B (en) * 2015-12-11 2018-10-19 北京大学 Based on dialogue and divide the mirror cartoon image template recognition method and system that joint identifies
CN108334805B (en) * 2017-03-08 2020-04-03 腾讯科技(深圳)有限公司 Method and device for detecting document reading sequence
CN108287858B (en) * 2017-03-02 2021-08-10 腾讯科技(深圳)有限公司 Semantic extraction method and device for natural language
US10423828B2 (en) * 2017-12-15 2019-09-24 Adobe Inc. Using deep learning techniques to determine the contextual reading order in a form document
CN110609902B (en) * 2018-05-28 2021-10-22 华为技术有限公司 A text processing method and device based on fusion knowledge graph
CN109117477B (en) * 2018-07-17 2022-01-28 广州大学 Chinese field-oriented non-classification relation extraction method, device, equipment and medium
CN109657221B (en) * 2018-12-13 2023-08-01 北京金山数字娱乐科技有限公司 Document paragraph sorting method, sorting device, electronic equipment and storage medium
CN109636049B (en) * 2018-12-19 2021-10-29 浙江工业大学 A Congestion Index Prediction Method Combining Road Network Topology and Semantic Correlation
CN109816009B (en) * 2019-01-18 2021-08-10 南京旷云科技有限公司 Multi-label image classification method, device and equipment based on graph convolution
CN110162653B (en) * 2019-05-13 2021-07-30 北京百度网讯科技有限公司 A kind of image and text sorting recommendation method and terminal device
CN110337016B (en) * 2019-06-13 2020-08-14 山东大学 Short video personalized recommendation method, system, readable storage medium and computer equipment based on multimodal graph convolutional network
CN110363190A (en) * 2019-07-26 2019-10-22 中国工商银行股份有限公司 A kind of character recognition method, device and equipment
CN110728151B (en) * 2019-10-23 2024-03-12 深圳报业集团 Information depth processing method and system based on visual characteristics

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077270A (en) * 2013-03-29 2014-10-01 富士胶片株式会社 Electronic book production apparatus, electronic book system and electronic book production method

Also Published As

Publication number Publication date
CN113283432A (en) 2021-08-20

Similar Documents

Publication Publication Date Title
US11055566B1 (en) Utilizing a large-scale object detector to automatically select objects in digital images
CN109960734B (en) Question Answering for Data Visualization
CN113283432B (en) Image recognition, text sorting method and device
US10055391B2 (en) Method and apparatus for forming a structured document from unstructured information
JP5134628B2 (en) Media material analysis of consecutive articles
JP2022541199A (en) A system and method for inserting data into a structured database based on image representations of data tables.
CN114155543A (en) Neural network training method, document image understanding method, apparatus and device
CN113343012B (en) News matching method, device, equipment and storage medium
CN113569888B (en) Image annotation method, device, equipment and medium
CN114067343B (en) A data set construction method, model training method and corresponding device
CN115131803B (en) Method, device, computer equipment and storage medium for identifying document word size
CN113704623A (en) Data recommendation method, device, equipment and storage medium
CN116610304B (en) Page code generation method, device, equipment and storage medium
CN113936187A (en) Text image synthesis method and device, storage medium and electronic equipment
CN115063784A (en) Bill image information extraction method and device, storage medium and electronic equipment
US11869127B2 (en) Image manipulation method and apparatus
CN116225956B (en) Automated testing method, apparatus, computer device and storage medium
CN117993493A (en) Script generation method, device, equipment and storage medium based on event graph
CN119066179B (en) Question and answer processing method, computer program product, device and medium
CN120071373A (en) Page-crossing cell merging method and device, electronic equipment and storage medium
CN113297411A (en) Method, device and equipment for measuring similarity of wheel-shaped atlas and storage medium
CN114780736B (en) Method and device for constructing customer service knowledge base
CN118096939A (en) System, method, equipment and medium for manufacturing AI digital business card based on 2D digital person
CN116363236A (en) Text-based mapping method, device, equipment and storage medium
CN116956052B (en) Application matching method and application matching device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant