CN117746432A

CN117746432A - Text splicing method and device, electronic equipment and storage medium

Info

Publication number: CN117746432A
Application number: CN202311583385.4A
Authority: CN
Inventors: 马坤; 吕晓丹; 金雯; 王波
Original assignee: Yuanbao Kechuang Beijing Technology Co ltd
Current assignee: Yuanbao Kechuang Beijing Technology Co ltd
Priority date: 2023-11-24
Filing date: 2023-11-24
Publication date: 2024-03-22

Abstract

The invention provides a text splicing method, a text splicing device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining each text block in the target image; for each text block, determining the number mark of the text block, and marking the number mark at a preset position corresponding to the text block to obtain a marked target image; inputting the marked target image into a numbering and sorting model to obtain numbering mark sorting output by the numbering and sorting model; the serial number ordering model is obtained by training an initial serial number ordering model based on sample images and corresponding ordering labels, the sample images are images added with sample serial number marks, and the ordering labels are labels obtained by ordering the sample serial number marks based on the sequence of sample texts; and sorting and splicing all text blocks based on the number marks to obtain a spliced text. The numbering marks can enhance the image information, further can extract the enhanced image features, are favorable for accurately predicting the numbering mark ordering of the model, and can obtain the spliced text with high accuracy based on the ordering.

Description

Text splicing method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of information processing technologies, and in particular, to a text splicing method, a text splicing device, an electronic device, and a storage medium.

Background

In practical applications, there are many business scenarios that require spliced text. For example, in a scene of image text transfer, all texts in an image need to be recognized first, then text stitching is performed based on each recognized text to obtain a complete text corresponding to the image, and the complete text is stored in a storage medium of an electronic device to realize the image text transfer. For another example, in a structured extraction scenario for image information, a target text in an image needs to be extracted according to a service requirement, at this time, text stitching is also required to be performed on each text identified in the image, so as to obtain a complete text corresponding to the image, and extraction of the target text is performed based on the complete text.

For text splicing, text splicing based on preset rules is common in the related art, for example, splicing is performed according to preset rules from top to bottom and from left to right based on the relative positional relationship of each text in an image. When text is spliced based on this method, the text cannot be spliced in the correct order due to the limitation of the rule fixability. For example, the reading sequence of the partial texts in the image is not from top to bottom and from left to right, so that when the partial texts are spliced based on rules, the arrangement sequence of the partial texts in the spliced complete texts is wrong; or, if the reading sequence of the whole text of some images is not from top to bottom or from left to right, the whole text of the obtained whole text is spliced incorrectly. Therefore, the accuracy of the text splicing is lower when the text splicing is performed based on the existing method.

Disclosure of Invention

The invention provides a text splicing method, a device, electronic equipment and a storage medium, which are used for solving the defect of lower accuracy of text splicing in the prior art and achieving the purpose of improving the accuracy of text splicing.

The invention provides a text splicing method, which comprises the following steps:

determining each text block in the target image;

determining a numbering mark of each text block in the target image, and marking the numbering mark at a preset position corresponding to the text block to obtain a marked target image;

inputting the marked target image into a numbering and sorting model to obtain numbering and sorting output by the numbering and sorting model; the serial number sorting model is obtained by training an initial serial number sorting model based on sample images and corresponding sorting labels, the sample images are images obtained by adding sample serial number marks to sample texts in the sample images, and the sorting labels are labels obtained by sorting the sample serial number marks based on the sequence of the sample texts;

and splicing the text blocks based on the number mark sequencing to obtain a spliced text.

According to the text splicing method provided by the invention, the determining of the numbering mark of the text block comprises the following steps:

determining the height of the text block in the target image based on the position information of the text block in the target image;

determining a height of the numbered indicia that matches a height of the text block;

the numbered indicia are randomly generated based on the height of the numbered indicia.

randomly generating a label corresponding to the text block;

at least one character in the label and the text block is determined as a numbered label of the text block.

According to the text splicing method provided by the invention, the method for determining at least one character in the label and the text block as the numbered label of the text block comprises the following steps:

surrounding the text block based on a minimum circumscribed frame to obtain a frame-selected text block under the condition that the at least one character comprises all characters in the text block;

and determining the label and the block text as the numbered label of the text block.

determining the pixel color of each pixel point based on the pixel value of each pixel point in the target image;

determining the pixel color with the maximum number of pixel points as the reference color of the target image;

determining a target color based on the reference color, wherein the color difference between the target color and the reference color is larger than a preset color difference;

a numbered indicia of the text block is determined based on the target color.

randomly generating a label corresponding to the text block;

extracting a background image at the position of the mark;

the background image and the label are determined as numbered labels of the text blocks.

According to the text splicing method provided by the invention, the method further comprises the following steps:

receiving a text splicing request, wherein the text splicing request comprises information to be extracted;

and extracting target information corresponding to the information to be extracted from the spliced text based on the text splicing request.

The invention also provides a text splicing device, which comprises:

The first determining module is used for determining each text block in the target image;

the second determining module is used for determining the number marks of the text blocks aiming at the text blocks in the target image, and marking the number marks at preset positions of the corresponding text blocks to obtain a marked target image;

the processing module is used for inputting the marked target image into a numbering and sorting model to obtain numbering and marking sorting output by the numbering and sorting model; the serial number sorting model is obtained by training an initial serial number sorting model based on sample images and corresponding sorting labels, the sample images are images obtained by adding sample serial number marks to sample texts in the sample images, and the sorting labels are labels obtained by sorting the sample serial number marks based on the sequence of the sample texts;

and the splicing module is used for splicing the text blocks based on the number mark sequencing to obtain spliced texts.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor implements any one of the text splicing methods when executing the program.

The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a text splicing method as described in any of the above.

The invention also provides a computer program product comprising a computer program which when executed by a processor implements a text splicing method as described in any of the above.

The invention provides a text splicing method, a device, electronic equipment and a storage medium, wherein the method can obtain each text block needing text splicing by determining each text block in a target image; determining the number mark of each text block in the target image, and labeling the number mark at a preset position corresponding to the text block to obtain a labeled target image, so that the image information of each text block can be increased in a targeted manner, each text block has more obvious image characteristics, and the recognition and the extraction of the characteristic information of each text block are facilitated through the subsequent visual processing of a computer; inputting the marked target image into a numbering and sorting model to obtain the numbering and sorting output by the numbering and sorting model, so that the marked target image can be subjected to computer vision processing through the model, and the model can extract image features with stronger characterization strength from the marked target image; the serial number sorting model is obtained by training an initial serial number sorting model based on sample images and corresponding sorting labels, the sample images are images obtained by adding sample serial number marks to sample texts in the sample images, and the sorting labels are labels obtained by sorting the sample serial number marks based on the sequence of the sample texts, so that the correct sorting sequence of each serial number mark can be accurately predicted based on the trained serial number sorting model and the image features with stronger characterization strength extracted by the model; the text blocks corresponding to the numbered marks can be spliced in the correct sequence based on the numbered mark sequencing, so that the spliced text in the correct sequencing can be obtained, and the accuracy of text splicing is improved.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a text splicing method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an annotated target image according to an embodiment of the present invention;

FIG. 3 is a second schematic diagram of a labeled target image according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of information extraction according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a text splicing device according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that, in the present invention, the numbers of the described objects, such as "first", "second", etc., are only used to distinguish the described objects, and do not have any sequence or technical meaning.

In various scenes aiming at image information application, many business requirements need to identify and splice texts in images correctly, and subsequent image information application can be performed on the basis of splicing the texts correctly. Therefore, high accuracy of text splicing is a precondition guarantee for effective development of applications.

The existing text splicing is mainly based on preset rules, so that the validity of the preset rules determines the accuracy of the text splicing to a great extent. The preset rules may be rule algorithms designed based on a priori, whether simple or complex, and are fixed rules designed based on a priori knowledge. In practical application, images with different sorting formats can be processed by different scenes, the sorting of text contents is quite different, and text sorting and splicing are performed through fixed preset rules, so that a certain amount of erroneously spliced texts can occur, and the text splicing accuracy is low.

In view of the above problems, an embodiment of the present invention provides a text splicing method, which is to determine each text block in a target image; determining the number marks of the text blocks aiming at each text block in the target image, and marking the number marks at preset positions corresponding to the text blocks to obtain a marked target image; inputting the marked target image into a numbering and sorting model to obtain numbering mark sorting output by the numbering and sorting model; the serial number ordering model is obtained by training an initial serial number ordering model based on sample images and corresponding ordering labels, the sample images are images obtained by adding sample serial number marks to sample texts in the sample images, and the ordering labels are labels obtained by ordering the sample serial number marks based on the sequence of the sample texts; and splicing the text blocks based on the number mark sequencing to obtain a spliced text. According to the method, the thought of preset rule text splicing is abandoned, the text splicing task is realized through computer vision processing, and text splicing with high accuracy can be carried out on various types of target images by means of the serial number sequencing model obtained through training. The initial number sorting model is trained based on the sample images and the corresponding sorting labels, sorting experience of the number marks can be accumulated in the initial number sorting model in the learning process, when various types of target images are faced, the number marks in the target images can be sorted with higher accuracy according to the accumulated sorting experience, and therefore the number marks with higher accuracy are output to sort, and accurate spliced texts can be further obtained.

Meanwhile, the method of the embodiment of the invention integrates the concept of image information enhancement in the model training and model application stages. When the model is trained, sample number marks are added to sample texts in sample images and used for reinforcing information of the sample texts, so that the effect of reinforcing information and highlighting prompt on each sample text is realized by the sample number marks, the image features with stronger characterization strength can be conveniently extracted by the initial number sequencing model during learning, and the capability of accurately sequencing the model is improved while the model learning difficulty is reduced; correspondingly, when the model is applied, after each text block in the target image is determined, numbering marks are marked on preset positions of each text block, an image with enhanced image information is obtained, namely, the marked target image is obtained, the numbering mark ordering can be predicted more accurately based on the marked target image and the trained numbering ordering model, and the corresponding text blocks can be spliced in sequence based on the correct numbering mark ordering, so that a spliced text with higher accuracy is obtained. The text splicing method provided by the embodiment of the invention is described below with reference to fig. 1 to 4.

Fig. 1 is a schematic flow chart of a text splicing method provided by an embodiment of the present invention, and the text splicing method provided by the embodiment of the present invention is applicable to any scene where text in an image is spliced. The execution main body of the method can be electronic equipment such as a smart phone, a computer, a server cluster or specially designed text splicing equipment, and also can be a text splicing device arranged in the electronic equipment, and the text splicing device can be realized by software, hardware or a combination of the two. As shown in fig. 1, the text splicing method includes steps 110 to 140.

In step 110, each text block in the target image is determined.

Specifically, the target image, that is, the image to be text spliced, may determine each text block in the target image by any image-text recognition method, for example, text block determination may be performed based on an image recognition model, text block determination may be performed based on an image-text conversion algorithm, or the like. The text block may be one text character or a set of two or more text characters with similar positions.

Illustratively, text blocks in the target image may be determined using an optical character recognition (Optical Character Recognition, OCR) engine to perform text recognition on the target image. OCR engines may be, for example, open source tools paddleocr and easycr, etc.

And 120, determining the number marks of the text blocks for each text block in the target image, and marking the number marks at preset positions corresponding to the text blocks to obtain the marked target image.

In particular, a numbered marking is understood to mean a marking which can serve as a unique identification, for example a marking comprising at least one of a number, a word, a letter or an identifier. Determining the number marks of the text blocks, for example, randomly generating or sequentially generating the number marks corresponding to the text blocks through a mark generation algorithm; the number marks may be generated based on a number mark criterion.

The preset position of the text block may be any position around the text block, for example, may be left side, upper side, right side, or lower side of the text block, which is not limited in the embodiment of the present invention. And determining and marking the number marks of each text block in the target image to obtain the marked target image.

Fig. 2 is one of schematic diagrams of a labeled target image according to an embodiment of the present invention, as shown in fig. 2, after each text block is determined in the target image, a numerical numbering mark may be determined, and each numbering mark is labeled on the left side of the corresponding text block.

For example, the serial number mark generated by the mark generation algorithm may be that each text block in the target image is determined by the OCR engine, the serial number of each text block is numbered according to the order in which the OCR engine outputs the text blocks, the serial number is determined as the serial number mark of the corresponding text block, and the serial number mark is marked at the preset position of the corresponding text block in the target image. In this way, a target image can be obtained after labeling as shown in fig. 2.

For example, the random generation of the number marks by the mark generation algorithm may be, for example, providing a mark generation program, and based on the mark generation program, randomly generating a corresponding mark serving as a unique identification for each text block, for example, a random number or symbol, and determining each mark as the number mark of each text block.

The generation of the number marks based on the number mark criterion may be, for example, setting a number mark criterion, and generating the number marks corresponding to the text blocks based on the number mark criterion. The numbering standard may be, for example, that the text blocks are sequentially numbered according to the line number and the column number where the text blocks are located, and each text block number of the sequential numbering is determined as the numbering mark of each text block.

130, inputting the marked target image into a numbering and sorting model to obtain a numbering and marking order output by the numbering and sorting model; the serial number ordering model is obtained by training the initial serial number ordering model based on sample images and corresponding ordering labels, the sample images are images obtained by adding sample number marks to sample texts in the sample images, and the ordering labels are labels obtained by ordering the sample number marks based on the sequence of the sample texts.

Specifically, the sequence of the numbered marks may be a sequence obtained by sorting the numbered marks of each text block in the target image, where the sequence corresponds to a correctly sorted sequence of each text block in the target image, or it may be understood that the sequence corresponds to a correctly read order of each text block in the target image.

The initial numbering and ranking model may be an initial neural network model, and may include a model composed of at least one of a convolutional neural network (Convolutional Neural Network, CNN), a cyclic neural network (Recurrent Neural Networks, RNN), a Long short-term memory (LSTM) neural network, and a deep neural network (Deep Neural Networks, DNN). Based on the initial number ordering model, the sample image and the ordering label corresponding to the sample image, the number ordering model can be trained.

The numbering and sorting model may be a neural network model obtained by performing supervised training on the initial numbering and sorting model based on the sample image and the corresponding sorting label. Determining each text block for an image comprising text, and adding a corresponding numbered mark for each text block to obtain a sample image, wherein each text block is each sample text, and the added numbered mark is a sample numbered mark; and sequencing the numbered marks according to the correct sequencing order to obtain a training truth value label, namely a sequencing label corresponding to the sample image. It should be understood that, in the model training stage and the model application stage, if the same method for determining and labeling the number marks is used, the accuracy of the number mark ordering output by the number ordering model in application can be improved. If different methods for determining and labeling the number marks are adopted in the model training stage and the model application stage, the difficulty of model prediction is increased, and the accuracy of the number mark sequencing output by the number sequencing model is affected.

The number sorting model outputs the number mark sorting, so that the task difficulty and complexity of directly splicing each text block in the target image can be reduced. During training, training can be performed based on sample images of various layouts to improve the prediction capability of the model, and stability and robustness of the model in application can be improved. Meanwhile, the number marking and sequencing prediction is performed based on the marked target image, so that the step of preset rule text splicing can be avoided, the step of layout analysis can be avoided, the extra risk of accuracy reduction caused by various steps can be reduced, the number marking and sequencing prediction is performed based on the image characteristics, the training and convergence of the model are facilitated, and the efficiency of training the model suitable for an application scene can be improved.

And 140, splicing the text blocks based on the number mark sequencing to obtain a spliced text.

Specifically, after the number mark ordering output by the number ordering model is obtained, each text block can be called based on the corresponding relation between the number mark and the text block, and the text blocks are correspondingly ordered according to the order of the number mark ordering, so that the spliced text with higher accuracy can be obtained.

According to the text splicing method provided by the embodiment of the invention, each text block required to be subjected to text splicing can be obtained by determining each text block in the target image; determining the number mark of each text block in the target image, and labeling the number mark at a preset position corresponding to the text block to obtain a labeled target image, so that the image information of each text block can be increased in a targeted manner, each text block has more obvious image characteristics, and the recognition and the extraction of the characteristic information of each text block are facilitated through the subsequent visual processing of a computer; inputting the marked target image into a numbering and sorting model to obtain the numbering and sorting output by the numbering and sorting model, so that the marked target image can be subjected to computer vision processing through the model, and the model can extract image features with stronger characterization strength from the marked target image; the serial number sorting model is obtained by training an initial serial number sorting model based on sample images and corresponding sorting labels, the sample images are images obtained by adding sample serial number marks to sample texts in the sample images, and the sorting labels are labels obtained by sorting the sample serial number marks based on the sequence of the sample texts, so that the correct sorting sequence of each serial number mark can be accurately predicted based on the trained serial number sorting model and the image features with stronger characterization strength extracted by the model; the text blocks corresponding to the numbered marks can be spliced in the correct sequence based on the numbered mark sequencing, so that the spliced text in the correct sequencing can be obtained, and the accuracy of text splicing is improved.

For example, in order to facilitate the model to more accurately identify and extract image features, when determining the number marks, the font height of the number marks is adapted to the height of the text block, so that the difficulty of model reasoning prediction can be reduced, and the accuracy of determining the ordering of the number marks is improved.

In an embodiment, when determining the number label of the text block, the following may be specifically implemented: determining the height of the text block in the target image based on the position information of the text block in the target image; determining a height of a numbered mark that matches a height of a text block; the numbered indicia are randomly generated based on the height of the numbered indicia.

For example, the location information of the text block in the target image may be information that characterizes the location of the text block. For example, the position information may include position coordinates of each vertex of the text block, position coordinates of a center point of the text block, and a length and a height of the text block, etc.

When determining the text block based on the image recognition model, the task of determining the text block position information can be added into the image recognition model, so that the position information of each text block in the target image can be obtained while the content of each text block in the target image is recognized. Or when the text block is subjected to image-text conversion based on the OCR engine, the position information of the text block in the target image can be obtained through the OCR engine.

And when the height of the text block in the target image is determined based on the position information of the text block in the target image, calculating the difference value through the position coordinates of each vertex of the text block to obtain the height value of the text block in the target image. Or when the position information comprises the height value of the text block, directly acquiring the height value, namely determining the height of the text block in the target image.

The height of the numbering mark matched with the height of the text block is determined, the height of the numbering mark can be set to be the same as the height of the text block, or the word size of the numbering mark can be set to be similar to the height of the text block, so that the word size or the height of the numbering mark can be adaptively adjusted based on the height of the text block.

Based on the height of the numbering marks, the numbering marks are randomly generated, which is understood to mean that after determining the height of the numbering marks matching the height of the text blocks, a random number, e.g. a random number, is generated which is adapted to the height of the respective text block. It should be noted that, in the training stage of the initial numbering and ranking model, if numbering and marking are performed on each text block from top to bottom and from left to right in the image according to the mathematical ranking order of the digits, the trained numbering and ranking model may output a fitted numbering and marking ranking. It can be understood that when each sample text in a sample image is marked based on the sequence of the numbers, a mathematical sequence rule exists among the number marks of each sample, and when an initial number sequence model is trained through a large number of similar sample images, the model may fall into early maturity and early convergence, so that the trained number sequence model can sequence each number mark according to the mathematical sequence of the numbers, and the output number marks are sequenced and fitted, so that the accuracy of the number mark sequencing is reduced. For this phenomenon, the randomly generated numbered marks can overcome the situation that the numbered marks are ordered and fit. When the model is trained and applied, the number marks of the text blocks are determined by using the random generation mode of the same rule, so that the prediction capability of the model is better, and the ordering accuracy of the output number marks is higher.

In the training and application stage of the model, the number marks of the text blocks can be determined by adopting the number marks with self-adaption height or adopting a mode of randomly generating the number marks. In the same target image or sample image, the randomly generated number marks are still unique marks different from each other.

In the embodiment, determining the height of the text block in the target image based on the position information of the text block in the target image; determining a height of a numbered mark that matches a height of a text block; the numbered indicia are randomly generated based on the height of the numbered indicia. On the one hand, by matching the heights of the text blocks and the numbering marks, the situation that the text blocks with larger heights are correspondingly marked with the numbering marks with smaller heights or the text blocks with smaller heights are correspondingly marked with the numbering marks with larger heights can be avoided, so that the gap between the heights of the text blocks and the corresponding numbering marks can be reduced, and the difficulty of model prediction is increased due to the fact that the corresponding relation between the numbering marks and the text blocks is not clear. On the other hand, the number marks corresponding to the text blocks are generated based on a mode of randomly generating the number marks, so that the prediction capability of a model can be improved, and the number marks with higher accuracy are sequenced in the application of the model.

By way of example, text character content in each text block is generally different, so that when the number marks are determined, the characteristics can be utilized to increase image characteristic differences of each text block, improve the capability of the number sorting model for identifying the number marks and predicting the number mark sorting, and further improve the accuracy of the obtained number mark sorting.

In an embodiment, when determining the number label of the text block, the following may be specifically implemented: randomly generating labels corresponding to the text blocks; at least one character of the label and the text block is determined as a numbered label of the text block.

Specifically, in a manner of randomly generating the labels, corresponding labels may be determined for each text block, and the labels may be labels in digital form. The reference number may be marked at a preset position of the corresponding text block, the reference number and at least one character in the corresponding text block may form special image information, and the reference number and at least one character in the text block may be determined as a numbered mark of the text block.

FIG. 3 is a second schematic diagram of a labeled target image according to the embodiment of the present invention, as shown in FIG. 3, after each text block is determined in the target image, a corresponding digital label is generated for each text block by adopting a label random generation mode, each digital label is labeled on the left side of the corresponding text block, for example, the first two words in the text block corresponding to the digital label 6 are "listed", and then "6 is listed" is determined as a numbered label corresponding to the text block; the first two words in the text block corresponding to the number 8 are "quality guarantee", and then the "8 quality guarantee" is determined as the number mark corresponding to the text block; the first two words in the text block corresponding to the number label 9 are universal, and the universal 9 is determined as the number label corresponding to the text block; the first two words in the text block corresponding to the number mark 7 are English, and then the English 7 is determined as the number mark corresponding to the text block; and if the first two words in the text block corresponding to the number mark 2 are "Chinese", determining the "2 Chinese" as the number mark corresponding to the text block. After the marked target image is input into a numbering and sorting model, a numbering mark sorting example output by the numbering and sorting model is' 6 to be marketed; 8 quality guarantee … general; 7 English; 2 Chinese; …).

In this embodiment, when determining the number label of the text block, it may be a label corresponding to the randomly generated text block; at least one character of the label and the text block is determined as a numbered label of the text block. Based on the method, the marked numbered marks and part of characters in the corresponding text blocks can be used as prediction targets of the models, and the capability and stability of the number sorting models for predicting the number mark sorting are further improved.

In one implementation, in order to further improve the prediction capability of the number ordering model, when determining at least one character in the label and the text block as the number label of the text block, the text block is surrounded based on the minimum circumscribed frame under the condition that the at least one character comprises all characters in the text block, so as to obtain the frame-selected text block; the labels and the box text blocks are determined as numbered labels for the text blocks.

Specifically, when determining the number marks, a mode of random generation of marks can be adopted to generate corresponding digital marks for each text block, each digital mark is marked on the left side of the corresponding text block, and the text block is surrounded by the corresponding text block based on the minimum circumscribed frame, so that the box-selected text block is obtained.

As shown in fig. 2, for each text block, the corresponding text block is surrounded by the minimum circumscribed box, so that a frame text block can be obtained, a digital label is randomly generated for each text block, each digital label is marked on the left side of each text block, and the label and the frame text block are jointly determined to be the number label of the text block.

It should be understood that when the initial number ordering model is trained, the sample number marks of the sample images are determined by adopting the method, so that the model can more easily extract the image features corresponding to each text block during training, and the difficulty of model training is reduced. When the trained numbering and sequencing model is applied to prediction, the numbering marks corresponding to the text blocks in the target image are determined in the mode, so that the numbering and sequencing model can more accurately extract the image characteristics of the text blocks in the target image, and further more accurate numbering and marking sequencing can be obtained.

In this embodiment, when at least one character in the label and the text block is determined as the label of the text block, the text block is surrounded based on the minimum circumscribed frame under the condition that at least one character includes all characters in the text block, so as to obtain a frame-selected text block; the labels and the box text blocks are determined as numbered labels for the text blocks. Based on the method, the label and the whole text block selected by the frame are used as the number marks, so that the difference of image characteristics among the text blocks is larger, and the unique identification function of the text block can be enhanced by utilizing the character difference in the text block, namely, the function of checking the text blocks is achieved, therefore, the robustness of the number ordering model in the process of outputting the number marks for ordering can be improved.

In some application scenarios, the color of the image may have a certain influence on the identification of the numbering mark, for example, the background color of the target image is black, if the color of the numbering mark is fixedly set to be black, the numbering mark marked on the target image may not be accurately identified, and at this time, the numbering mark cannot play a role in enhancing the image information of each text block. To avoid this, the color of the numbering marks may be adaptively determined to circumvent this risk.

In an embodiment, determining the number mark of the text block may specifically be determining the pixel color of each pixel point based on the pixel value of each pixel point in the target image; determining the pixel color with the maximum number of pixel points as the reference color of the target image; determining a target color based on the reference color, wherein the color difference between the target color and the reference color is larger than a preset color difference; the numbered indicia of the text block is determined based on the target color.

Specifically, the pixel value of each pixel point in the target image may be, for example, an RGB value in the pixel values, that is, a value of an R channel, a value of a G channel, and a value of a B channel; the gradation value among the pixel values may be a gradation value or the like.

Illustratively, while determining each text block in the target image, the pixel value of each pixel point in the target image can be analyzed and determined through an image pixel value analysis algorithm, so as to obtain the pixel value of each pixel point; in the case where the input target image includes pixel value information of each pixel, the pixel value of each pixel may be directly acquired.

And analyzing each pixel value based on the pixel value of each pixel point to determine the pixel color of each pixel point. For example, pixel colors are divided into a predetermined number of color categories, such as pixel colors of seven color categories of black, white, red, yellow, blue, green, and gray, based on the value ranges of the RGB channels in advance. Based on the analysis and comparison of the pixel value of each pixel point in the target image and the preset RGB value range of each color type, the pixel color of each pixel point can be determined. And counting the number of pixel points under various pixel colors, and determining the pixel color corresponding to the maximum number of the pixel points as the reference color of the target image.

Further, a target color is determined based on the reference color, and a color difference between the target color and the reference color is greater than a preset color difference. This step is for determining, as the target color, a color that contrasts with the target image color, for setting as the color of the numbering mark. Specifically, after the reference color is determined, at least one target color may be determined based on the reference color and a preset color difference. The preset color difference may be a preset color difference value, for example, preset color difference values between colors of black, white, red, yellow, blue, green and gray pixels may be calculated, an average value is calculated based on the respective color difference values, and the obtained average color difference value is determined as the preset color difference.

For example, if it is determined that the reference color of the target image is black and the color difference between black and white is greater than the preset color difference, white may be determined as the target color. When the number marks of the text block are determined, the color of each number mark is set to be white, and the white number marks are marked in the target image, so that the number marks in the target image can be conveniently identified, and the number mark sequence can be conveniently predicted.

In this embodiment, when determining the number mark of the text block, the pixel color of each pixel point is determined based on the pixel value of each pixel point in the target image; determining the pixel color with the maximum number of pixel points as the reference color of the target image; determining a target color based on the reference color, wherein the color difference between the target color and the reference color is larger than a preset color difference; the numbered indicia of the text block is determined based on the target color. Based on the method, the colors of the numbering marks are determined in a self-adaptive mode, the colors of the numbering marks are prevented from being identical or similar to the colors of the target images, so that the prediction difficulty in model training and model application can be reduced, and the accuracy of the output ordering of the numbering marks is improved.

For example, when determining the number marks, the fonts of the number marks can be adaptively determined, so that the number marks are clearer and more obvious in the target image, and the recognition and the prediction of the number ordering model are facilitated.

For example, while each text block in the target image is determined, the font of each text block may be determined by a font analysis algorithm, and the font of the text block is determined as the target font. When the numbering mark is determined, the target font is set to the numbered mark font. Alternatively, while each text block in the target image is determined, the character type in each text block is determined, for example, text in the text block is detected as a chinese character, an english character, a german character, or the like. Based on the preset corresponding relation between the character type and the font, the target font corresponding to the text block can be determined, and the target font is determined to be the font marked by the number. After the target font is determined, each numbered mark of the target font can be marked at a preset position of each text block in the target image.

In this embodiment, when determining the number label, the font of the self-adaptive number label can be determined, so as to achieve the purpose of clearly and obviously labeling the number label, and facilitate the visual processing of a computer, that is, the difficulty of predicting the number sorting model can be reduced, and the accuracy of prediction can be improved.

In practical application, in order to further enhance the unique identification function of the number mark and improve the clarity of the number mark, when determining the number mark, the background image and the mark can be combined with the background image at the position of the mark to determine the number mark of the text block.

In an embodiment, determining the number mark of the text block, specifically, randomly generating the corresponding mark of the text block; extracting a background image at the position of the mark; the background image and the label are determined as numbered labels of the text blocks.

Specifically, when determining the number marks, a corresponding number mark may be generated for each text block in a manner of random generation of marks. And marking each numerical label at a preset position of the corresponding text block. The background image at the position of the reference mark is extracted, and the background image can be the image of the target image at the position. For example, when the target image includes a background pattern, the background image is extracted by the center point and the preset area of the digital label at the position marked by the digital label. The numeric label with the background image is determined as the numeric label of the text block.

In this embodiment, the labels corresponding to the text blocks are randomly generated; extracting a background image at the position of the mark; the background image and the label are determined to be the numbered labels of the text blocks, so that the uniqueness and the clarity of the numbered labels can be further enhanced, the image characteristics of each text block can be enhanced by adopting the mode in the model training and model reasoning stages, and the prediction capability of the model is further improved, so that the numbered label sequence with higher accuracy is obtained.

It should be understood that the technical features in the above embodiments may be applied in any combination without mutual collision, so as to achieve the purpose of improving the accuracy of text splicing.

In practical applications, there are many business scenarios in which the key text information in the target image needs to be extracted in a structured manner, where the structured extraction can be understood as extracting the information that needs to be extracted in the business scenario. For example, in business scenarios such as financial reimbursement, certificate identification, intelligent claims or case extraction, targeted extraction of information of interest in images such as financial statements, certificates, insurance policies or cases is required. For both the plain text image and the rich document image in various information forms including text, graphics or tables, the following two types of methods can be used for extracting the structured information.

The first method mainly comprises two steps, wherein the first step is to detect and identify texts in a target image through a text identification tool to obtain all texts contained in the target image; and secondly, aiming at all the texts obtained in the first step, carrying out model prediction processing on all the texts by using a text information extraction model so as to obtain target text information needing to be extracted.

In this type of approach, the first step is typically text recognition based on OCR models. The OCR model and the information extraction model are decoupled mutually independently, and the recognition of the universal OCR model is based on text blocks, namely after each text block in the target image is detected and recognized respectively, the results of the text blocks are required to be spliced correctly to obtain chapter paragraphs conforming to the real reading sequence. Errors in the splicing sequence process can introduce a large amount of text noise for the second-stage information extraction, and the robustness of a subsequent extraction model is greatly challenged.

Aiming at the link of the splicing sequence, the method mostly performs text splicing with preset rules according to text position information and the like, or introduces a layout analysis model to combine the characteristics of the text, such as position, content and the like, so as to realize the splicing. Splicing each text block by using a preset rule, and meeting the dilemma that the text block cannot be suitable for rich formats; the layout analysis model can be subject to the problems of easy image characteristic interference and large labeling workload, and needs to be matched with OCR text blocks, so that new errors and noise can be introduced undoubtedly. Therefore, the information error rate obtained by extraction by the method is higher.

The second method is based on a model capable of directly extracting information from the image, and the target text information output by the model is obtained after the target image is input into the model. Although this approach bypasses the steps of text recognition and text stitching, its accuracy is entirely dependent on the image recognition and prediction capabilities of the model. Under the conditions that the format is relatively fixed and the target field is short, such as an identity card or a train ticket, the model is easy to fit and the template type extraction is realized. However, the scheme has poor expansion applicability for the conditions of changeable formats and longer field content, and is difficult to have higher extraction accuracy in an open format scene.

Based on the problems, the text splicing method provided by the embodiment of the invention can solve the problem of lower text splicing accuracy. After the text splicing accuracy is improved, the accuracy of information extraction can be improved based on more accurate spliced text.

In an embodiment, the text splicing method further includes: receiving a text splicing request, wherein the text splicing request comprises information to be extracted; and extracting target information corresponding to the information to be extracted from the spliced text based on the text splicing request.

Specifically, the splicing request may be any form of request for triggering the start of text splicing, where the text splicing request includes information to be extracted. The information to be extracted can be understood as information which needs to be extracted when the extraction is structured.

For example, for the target images of multiple insurance clauses, extracting text information corresponding to the clauses of 'contract establishment and effectiveness' in each target image, wherein the text information corresponding to the clauses is the target information. For another example, for the target images of the multiple drug specifications, text information corresponding to the "specification" in each target image is extracted, and the text information corresponding to the "specification" is the target information, as in fig. 3, "200ml" is the target information to be extracted.

Based on the text splicing method provided by the embodiments, text splicing is performed on each text block in the target image, so that spliced text with higher accuracy can be obtained. Based on the text splicing request, target information corresponding to the information to be extracted can be extracted from the spliced text.

For example, according to task requirements in an application scene, model training is performed on the initial information extraction model, so that an information extraction model capable of extracting target information from the spliced text is obtained. And inputting the target image into a numbering and sorting model, so that numbering and sorting output by the numbering and sorting model can be obtained, calling each text block corresponding to each numbering and sorting based on the numbering and sorting, and arranging each text block according to the numbering and sorting order to obtain the spliced text. The spliced text is input into a trained information extraction model, and the target information to be extracted can be extracted.

In this embodiment, a text splicing request is received, where the text splicing request includes information to be extracted; based on the text splicing request, target information corresponding to the information to be extracted can be extracted from the spliced text. Based on the method, the link that the text blocks are spliced sequentially to obtain the spliced text of the whole text is converted into a two-step task of firstly predicting the correct text block splicing sequence and then splicing by marking the numbering marks in the target image. Although the steps are increased, the sorting accuracy can be improved based on the number marks, and the correctness of the spliced text can be further improved, so that the accuracy of extracting the target information is guaranteed. The method provided by the embodiment is easy to learn and is not influenced by the format and the content of the document, and the accuracy of information extraction can be effectively improved.

The process of extracting the information is described in detail below with reference to fig. 4 by way of a specific embodiment. Fig. 4 is a schematic flow chart of information extraction according to an embodiment of the present invention, and as shown in fig. 4, the information extraction may include steps 410 to 440.

Step 410, image preprocessing.

Specifically, the target image is preprocessed. The preprocessing may include the processes of correcting and denoising the target image direction.

For example, the direction of the target image is detected, and when the direction of the target image does not meet the information extraction requirement, the direction of the target image or the text in the target image is corrected to be correct. For another example, interference factors such as watermarks, folds, blurred areas or shielding in the target image are eliminated, so that a denoising effect is achieved.

Text detection and recognition, step 420.

Specifically, text blocks in the target image are detected and identified, and contents such as characters, symbols and the like in each text block are determined; position information of the text block in the target image is determined, for example, coordinates of a center point of the text block, width and height of the text block, coordinates of vertices of a minimum bounding box surrounding the text block, and the like.

At step 430, text blocks are ordered and spliced.

Specifically, each text block in the target image may be spliced by any text splicing method in the above embodiments, so as to obtain a spliced text.

Step 440, information extraction.

Specifically, after the spliced text with higher accuracy is obtained, various methods for extracting information can be applied to extract target information corresponding to the information to be extracted from the spliced text. For example, extraction of the target information is achieved through an information extraction model.

By way of example, when the target image is a paper or other image with a large number of pages, unfixed typesetting and large number of words, the extraction of the target information can be rapidly and accurately performed through the steps. The number marking quantity required by the scheme of the embodiment is only related to the number of the text blocks, and only the number marks of each text block are required to be marked, and all characters are not required to be marked; when the target image is an image with relatively fixed formats such as bills, the scheme can still pay attention to the target key fields such as the amount of money, the company, the tax number and the like, and the target information corresponding to the target key fields is extracted in a structuring mode, so that the scheme has strong applicability.

In this embodiment, through the steps of image preprocessing, text detection and recognition, text block ordering and stitching, and information extraction, the structured information extraction with higher accuracy can be performed on the target information in the target image. Meanwhile, the labeling process of the numbering mark is simple, so that the complexity of the task and the cost for preparing training data are low, the influence of the number of characters in a sample image or a target image is avoided, extra risks brought by steps such as layout analysis and the like are avoided, the training and convergence of the model are facilitated, and the overall performance on the document image information extraction task can be improved.

The text splicing device provided by the embodiment of the invention is described below, and the text splicing device described below and the text splicing method described above can be referred to correspondingly.

Fig. 5 is a schematic structural diagram of a text splicing device according to an embodiment of the present invention, and referring to fig. 5, a text splicing device 500 includes:

a first determining module 510, configured to determine each text block in the target image;

the second determining module 520 is configured to determine, for each text block in the target image, a number mark of the text block, and mark the number mark at a preset position corresponding to the text block, so as to obtain a marked target image;

the processing module 530 is configured to input the labeled target image into a numbering and sorting model, and obtain a numbering and sorting output by the numbering and sorting model; the serial number ordering model is obtained by training an initial serial number ordering model based on sample images and corresponding ordering labels, the sample images are images obtained by adding sample serial number marks to sample texts in the sample images, and the ordering labels are labels obtained by ordering the sample serial number marks based on the sequence of the sample texts;

and the splicing module 540 is used for splicing the text blocks based on the number mark ordering to obtain spliced texts.

In an example embodiment, the second determining module 520 is specifically configured to:

determining a height of a numbered mark that matches a height of a text block;

randomly generating labels corresponding to the text blocks;

at least one character of the label and the text block is determined as a numbered label of the text block.

surrounding the text block based on the minimum circumscribed box to obtain a box-selected text block under the condition that at least one character comprises all characters in the text block;

the labels and the box text blocks are determined as numbered labels for the text blocks.

The numbered indicia of the text block is determined based on the target color.

randomly generating labels corresponding to the text blocks;

extracting a background image at the position of the mark;

In an example embodiment, the text splicing device 500 further includes a receiving module and an extracting module;

the receiving module is used for receiving a text splicing request, wherein the text splicing request comprises information to be extracted;

the extraction module is used for extracting target information corresponding to the information to be extracted from the spliced text based on the text splicing request.

The apparatus of the present embodiment may be used to execute the method of any one of the embodiments of the text splicing method side, and the specific implementation process and technical effects thereof are similar to those of the embodiments of the text splicing method side, and specific reference may be made to the detailed description of the embodiments of the text splicing method side, which is not repeated herein.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 6, the electronic device may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, and memory 630 communicate with each other via communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a text splicing method comprising: determining each text block in the target image; determining the number marks of the text blocks aiming at each text block in the target image, and marking the number marks at preset positions corresponding to the text blocks to obtain a marked target image; inputting the marked target image into a numbering and sorting model to obtain numbering mark sorting output by the numbering and sorting model; the serial number ordering model is obtained by training an initial serial number ordering model based on sample images and corresponding ordering labels, the sample images are images obtained by adding sample serial number marks to sample texts in the sample images, and the ordering labels are labels obtained by ordering the sample serial number marks based on the sequence of the sample texts; and splicing the text blocks based on the number mark sequencing to obtain a spliced text.

Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, an embodiment of the present invention further provides a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, is implemented to perform the text splicing method provided by the above methods, the method including: determining each text block in the target image; determining the number marks of the text blocks aiming at each text block in the target image, and marking the number marks at preset positions corresponding to the text blocks to obtain a marked target image; inputting the marked target image into a numbering and sorting model to obtain numbering mark sorting output by the numbering and sorting model; the serial number ordering model is obtained by training an initial serial number ordering model based on sample images and corresponding ordering labels, the sample images are images obtained by adding sample serial number marks to sample texts in the sample images, and the ordering labels are labels obtained by ordering the sample serial number marks based on the sequence of the sample texts; and splicing the text blocks based on the number mark sequencing to obtain a spliced text.

In yet another aspect, embodiments of the present invention further provide a computer program product, the computer program product including a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing a text splicing method provided by the above methods, the method comprising: determining each text block in the target image; determining the number marks of the text blocks aiming at each text block in the target image, and marking the number marks at preset positions corresponding to the text blocks to obtain a marked target image; inputting the marked target image into a numbering and sorting model to obtain numbering mark sorting output by the numbering and sorting model; the serial number ordering model is obtained by training an initial serial number ordering model based on sample images and corresponding ordering labels, the sample images are images obtained by adding sample serial number marks to sample texts in the sample images, and the ordering labels are labels obtained by ordering the sample serial number marks based on the sequence of the sample texts; and splicing the text blocks based on the number mark sequencing to obtain a spliced text.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of text splicing, comprising:

determining each text block in the target image;

2. The text splicing method of claim 1, wherein the determining the numbered indicia of the text block comprises:

3. The text splicing method of claim 1, wherein the determining the numbered indicia of the text block comprises:

randomly generating a label corresponding to the text block;

4. A method of text splicing according to claim 3, wherein said determining at least one character of said label and said text block as a numbered label of said text block comprises:

5. The text splicing method of claim 1, wherein the determining the numbered indicia of the text block comprises:

a numbered indicia of the text block is determined based on the target color.

6. The text splicing method of claim 1, wherein the determining the numbered indicia of the text block comprises:

randomly generating a label corresponding to the text block;

extracting a background image at the position of the mark;

7. The text splicing method according to any one of claims 1 to 6, further comprising:

8. A text splicing device, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the text splicing method of any of claims 1 to 7 when the program is executed by the processor.

10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the text splicing method according to any of claims 1 to 7.