[go: up one dir, main page]

CN115937864B - Text overlap detection method, device, medium and electronic equipment - Google Patents

Text overlap detection method, device, medium and electronic equipment

Info

Publication number
CN115937864B
CN115937864B CN202211678556.7A CN202211678556A CN115937864B CN 115937864 B CN115937864 B CN 115937864B CN 202211678556 A CN202211678556 A CN 202211678556A CN 115937864 B CN115937864 B CN 115937864B
Authority
CN
China
Prior art keywords
text
overlapping
detected
vector
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211678556.7A
Other languages
Chinese (zh)
Other versions
CN115937864A (en
Inventor
梁晓云
高永强
杨萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Douyin Vision Co Ltd
Original Assignee
Douyin Vision Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Douyin Vision Co Ltd filed Critical Douyin Vision Co Ltd
Priority to CN202211678556.7A priority Critical patent/CN115937864B/en
Publication of CN115937864A publication Critical patent/CN115937864A/en
Application granted granted Critical
Publication of CN115937864B publication Critical patent/CN115937864B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

本公开涉及一种文本重叠检测方法、装置、介质和电子设备,属于计算机技术领域,能够提高文本重叠检测的准确性和召回精度。一种文本重叠检测方法,包括:对待检测对象进行文字识别,得到待检测对象中、文本行的文字识别置信度,将文字识别置信度低于预设识别置信度阈值的文本行添加到第一候选异常区域集中;从待检测对象中截取待检测对象的各个文本行的文本行图像,对文本行图像进行文本分类,将文本分类结果为重叠文本的文本行添加到第二候选异常区域集中;对待检测对象中的重叠文本进行目标检测,将目标检测结果为重叠文本的文本行添加到第三候选异常区域集中;将第一、第二和第三候选异常区域集的交集确定为文本重叠检测结果。

This disclosure relates to a text overlap detection method, apparatus, medium, and electronic device, belonging to the field of computer technology, and can improve the accuracy and recall precision of text overlap detection. A text overlap detection method includes: performing character recognition on the object to be detected to obtain the character recognition confidence scores of text lines in the object; adding text lines with character recognition confidence scores lower than a preset recognition confidence threshold to a first candidate anomaly region set; extracting text line images of each text line from the object to be detected, performing text classification on the text line images, and adding text lines whose text classification results are overlapping text to a second candidate anomaly region set; performing target detection on the overlapping text in the object to be detected, and adding text lines whose target detection results are overlapping text to a third candidate anomaly region set; and determining the intersection of the first, second, and third candidate anomaly region sets as the text overlap detection result.

Description

Text overlap detection method, device, medium and electronic equipment
Technical Field
The disclosure relates to the technical field of computers, and in particular relates to a text overlap detection method, a text overlap detection device, a text overlap detection medium and electronic equipment.
Background
Any abnormal phenomenon of text overlapping in an application program (as shown in the text overlapping schematic diagram of fig. 1) can seriously affect the user experience, and even the user cannot understand page information when the text overlapping is serious.
Thus, an intelligent detection scheme for text overlap is urgently needed.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
According to the text overlap detection method, text recognition is conducted on an object to be detected, text lines of which the text recognition confidence is lower than a preset recognition confidence threshold are obtained, the text lines of which the text recognition confidence is lower than a preset recognition confidence threshold are added to a first candidate abnormal region set, text line images of all the text lines of the object to be detected are intercepted from the object to be detected, text classification is conducted on the text line images, text classification results are added to a second candidate abnormal region set for overlapping text, object detection is conducted on the overlapped text in the object to be detected, text lines of which object detection results are the overlapping text are added to a third candidate abnormal region set, and intersection sets of the first candidate abnormal region set, the second candidate abnormal region set and the third candidate abnormal region set are determined to be text overlap detection results.
The text overlap detection device comprises a text recognition module, a text classification module, a target detection module and a determination module, wherein the text recognition module is used for performing text recognition on an object to be detected to obtain text recognition confidence of text lines in the object to be detected, text lines with the text recognition confidence lower than a preset recognition confidence threshold value are added into a first candidate abnormal region set, the text classification module is used for intercepting text line images of all text lines of the object to be detected from the object to be detected, performing text classification on the text line images, adding text classification results into a second candidate abnormal region set, performing target detection on overlapped text in the object to be detected, adding text lines of target detection results into a third candidate abnormal region set, and determining the intersection of the first candidate abnormal region set, the second candidate abnormal region set and the third candidate abnormal region set as text overlap detection results.
In a third aspect, the present disclosure provides a computer readable medium having stored thereon a computer program which, when executed by a processing device, implements the steps of the method according to any of the first aspects of the disclosure.
In a fourth aspect, the present disclosure provides an electronic device comprising storage means having stored thereon a computer program, processing means for executing the computer program in the storage means to carry out the steps of the method of any one of the first aspects of the present disclosure.
By adopting the technical scheme, the first candidate abnormal region set of the object to be detected is obtained by utilizing a text recognition mode, the second candidate abnormal region set of the object to be detected is obtained by utilizing a text classification mode, the third candidate abnormal region set of the object to be detected is obtained by utilizing a target detection mode, and the text overlapping detection result is determined by utilizing the intersection of the first candidate abnormal region set, the second candidate abnormal region set and the third candidate abnormal region set. In addition, the labor cost of text overlap detection is greatly reduced.
Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.
Drawings
The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale. In the drawings:
fig. 1 shows a schematic diagram of text overlap.
Fig. 2 is a flow diagram of a text overlap detection method according to an embodiment of the present disclosure.
Fig. 3 shows a schematic diagram with overlapping text lines and overlapping areas as detection targets in an object to be detected.
Fig. 4 is a flow chart of text classification of a text line image according to an embodiment of the present disclosure.
Fig. 5 shows a schematic diagram of the architecture of a transducer encoder.
Fig. 6 shows a schematic architecture of a multi-layer sensor.
Fig. 7 shows a schematic diagram of a text line image after filling.
Fig. 8 shows an architectural diagram of a text classifier according to an embodiment of the present disclosure.
Fig. 9 shows an architectural diagram of object detection according to an embodiment of the present disclosure.
FIG. 10 is a schematic diagram of automatically generating training samples by extracting foreground characters and superimposing the foreground characters to other locations in accordance with an embodiment of the present disclosure.
Fig. 11 is a schematic block diagram of a text overlap detection apparatus according to an embodiment of the present disclosure.
Fig. 12 shows a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment," another embodiment "means" at least one additional embodiment, "and" some embodiments "means" at least some embodiments. Related definitions of other terms will be given in the description below.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.
For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.
As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.
It will be appreciated that the above-described notification and user authorization process is merely illustrative and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.
Meanwhile, it can be understood that the data (including but not limited to the data itself, the acquisition or the use of the data) related to the technical scheme should conform to the requirements of the corresponding laws and regulations and related regulations.
Fig. 2 is a flow diagram of a text overlap detection method according to an embodiment of the present disclosure. As shown in fig. 2, the text overlap detection method includes the following steps S21 to S27.
In step S21, the text recognition is performed on the object to be detected, so as to obtain the text recognition confidence of the text line in the object to be detected.
The object to be detected refers to an object requiring text overlap detection. The object to be detected may be, for example, a screenshot of an APP, a screenshot of a game interface, etc.
Character recognition may be implemented using optical character recognition (Optical Character Recognition, OCR) or any other type of character recognition algorithm.
In this step, the text recognition confidence of each text line of the object to be detected is obtained through text recognition. For example, assuming that the object to be detected has 3 text lines, namely, text line 1, text line 2 and text line 3, then text recognition is performed on the three text lines, respectively, to obtain text line 1 text recognition confidence level 1, text line 2 text recognition confidence level 2 and text line 3 text recognition confidence level 3.
In step S22, a text line having a text recognition confidence below a preset recognition confidence threshold is added to the first set of candidate abnormal regions.
The preset recognition confidence threshold can be set empirically or can be obtained by self-learning. For example, the text recognition confidence of the text line where the text overlap exists is self-learned, and accordingly, an appropriate preset recognition confidence threshold is set. For example, the preset recognition confidence threshold may be set to 0.8 or other suitable value.
In this step, if the text recognition confidence of a text line is lower than the preset recognition confidence threshold, the text line is added to the first candidate abnormal region set, for example, if the foregoing text recognition confidence 1 is lower than the preset recognition confidence threshold, and the text recognition confidence 2 and the text recognition confidence 3 are both greater than the preset recognition confidence threshold, the text line 1 is added to the first candidate abnormal region set.
In step S23, text line images of the respective text lines of the object to be detected are cut from the object to be detected, and the text line images are text-classified.
In some embodiments, capturing text line images of individual text lines of an object to be detected from the object to be detected may include first obtaining coordinate information of each text line in the object to be detected, e.g., coordinate information of each text line may be obtained using a text recognition tool (e.g., OCR recognition), and then capturing text line images of each text line from the object to be detected, i.e., one text line is captured as one text line image, assuming that the object to be detected includes a total of 3 text lines, then a total of 3 text line images may be captured.
In some embodiments, text classifying the text line images may include separately text classifying each text line image using various text classifiers. For example, each text line image is individually text classified using a visual task based classification algorithm.
In step S24, the text classification result is added to the second candidate abnormal region set as a text line of the overlapped text.
After the text classification result of each text line image is obtained in step S23, the text classification result may be added to the second candidate abnormal region set for the text line of the overlapped text in step S24.
For example, assuming that the text classification result of text line 1 is overlapping text and the text classification results of text line 2 and text line 3 are both normal text, text line 1 would be added to the second set of candidate abnormal regions.
In step S25, object detection is performed on the overlapped text in the object to be detected.
Various object detection algorithms may be employed to object detect overlapping text in an object to be detected, such as the Darknet object detection algorithm.
In some embodiments, performing target detection on overlapping text in an object to be detected may include performing target detection with overlapping text lines and overlapping areas as detection targets in the object to be detected. That is, there are two objects for target detection, one is overlapping text lines and the other is overlapping area.
In step S26, the text line of the target detection result overlapping text is added to the third candidate abnormal region set.
For example, if the object detection algorithm detects that text line 1 is overlapping text and text line 2 and text line 3 are both normal text, text line 1 will be added to the third set of candidate abnormal regions in this step.
In some embodiments, in the case that the overlapped text line and the overlapped area are taken as the detection targets in the object to be detected, the step S26 of adding the target detection result to the third candidate abnormal area set may include adding the overlapped text line to the third candidate abnormal area set if the target detection result indicates that the coordinates of a certain overlapped text line overlap with the coordinates of a certain overlapped area. Thus, the accuracy of target detection can be improved, and false alarm of target detection can be reduced. For example, with the object to be detected shown in fig. 3, if the overlapped text line 1 and the overlapped area 2 are detected, and there is a coordinate overlap between the overlapped text line 1 and the overlapped area 2, in this case, the overlapped text line 1 may be added to the third candidate abnormal area set.
In step S27, the intersection of the first, second, and third candidate abnormal region sets is determined as a text overlap detection result.
By adopting the technical scheme, the first candidate abnormal region set of the object to be detected is obtained by utilizing a text recognition mode, the second candidate abnormal region set of the object to be detected is obtained by utilizing a text classification mode, the third candidate abnormal region set of the object to be detected is obtained by utilizing a target detection mode, and the text overlapping detection result is determined by utilizing the intersection of the first candidate abnormal region set, the second candidate abnormal region set and the third candidate abnormal region set. In addition, the labor cost of text overlap detection is greatly reduced.
In some embodiments, the text classification of the text line image may include cutting the text line image into a plurality of image blocks, and text classification of each image block using a text classifier that utilizes a class vector, a transformer structure, and a classification multi-layer perceptron, wherein the class vector is used to integrate the whole-image features of the text line image.
The text line image is cut because most text lines are generally only partially overlapped, and in the example of the "1990/us/science fiction" text line in fig. 1, only the "0/us" area is overlapped, and other areas are not overlapped, so that the text lines with the text overlapping phenomenon can be better distinguished from the normal text lines by cutting the text line image.
The classification multi-layer perceptron refers to a multi-layer perceptron adopting a classification model, and the classification result of the classification model comprises overlapped texts and normal texts.
By adopting the technical scheme, the text classification can be carried out on each image block and the category vector, and further, the whole image characteristic (such as text overlapping or non-overlapping) of the whole text line image can be known according to the text classification result of the category vector, and the characteristic (such as text overlapping or non-overlapping) of each image block can be known according to the text classification result of each image block.
Fig. 4 is a flow chart of text classification of a text line image according to an embodiment of the present disclosure. As shown in fig. 4, the text classification process includes steps S41 to S46.
In step S41, the text line image is cut into a plurality of image blocks, resulting in a first vector.
In some embodiments, the size of the convolution kernel of the neural network employed determines how large image blocks the text line image needs to be cut into. For example, if the size of the convolution kernel used is 16 x 16, then the size of the cut image block should be 16 x 16.
After the cut, the first vector obtained is k×n×m×j. Where k represents the number of image blocks obtained by cutting, n×m represents the size of each image block, and j represents the number of channels, for example, if a color text line image, the number of channels j is 3.
The text line image is cut because most text lines are generally only partially overlapped, such as the "1990/us/science fiction" text line example in fig. 1, only the "0/us" area is overlapped, and the text lines with the text overlap phenomenon can be better distinguished from the normal text lines by cutting the text line image.
In step S42, the first vector is linearly transformed to obtain a second vector.
Linear transformation (i.e., fully connected layer) refers to extracting flat pixel vectors in each image block, feeding each image block into the linear projection layer. Where the compressed dimension taken in performing the linear transformation is D, for example D may be 512 or other values. This step may also be referred to as tile embedding (Patch Embedding).
The second vector obtained through the processing of step S42 is k×n×m×d.
In step S43, a learnable class vector is added to the second vector to obtain a third vector, where the class vector is used to integrate the whole-image features of the text line image.
The third vector obtained after adding the class vector cls_token is (k+1) ×n×m×d.
In step S44, a position code is added to the third vector to obtain a fourth vector, where the position code is used to characterize the relative positional relationship of the image blocks.
Since the sequence information of the input series is lost in the subsequent encoding process, position encoding is added here so that the relative positional relationship of the individual image blocks can still be known after encoding. The fourth vector obtained after adding the position code is (k+1) × (n×m+1) ×d.
In step S45, the fourth vector is encoded using a transducer structure.
Fig. 5 shows a schematic diagram of the architecture of a transducer encoder. As shown in fig. 5, after the embedded image block (i.e., the fourth vector) is input to the transform encoder, it is first processed by layer normalization (Layer Normalization), then processed by a multi-head attention module for feature enhancement, the output result of the multi-head attention module is connected with the embedded image block by residual, then processed by layer normalization, then processed by multi-layer perception to extract features, the obtained output result is residual with the embedded image block again, and the final encoder output is obtained.
By encoding, the feature difference between the text overlap and the non-text overlap (i.e., normal text) is made more pronounced.
In step S46, the coded fourth vector is classified by using the classification multi-layer perceptron, so as to obtain a classification vector and a text classification result of each image block.
The architecture of the bifurcated multi-layered sensor is varied, and fig. 6 shows a schematic diagram of one architecture of the bifurcated multi-layered sensor. As shown in fig. 6, the input from the encoder is first linearized, then activated using an activation function (e.g., geLU), then the number of channels is reduced, then the linearization is performed again, then the number of channels is reduced again, and the final text classification result is obtained.
By adopting the above text classification technical scheme, text classification can be performed on each image block and the category vector, and then the whole image feature (such as text overlapping or non-overlapping) of the whole text line image can be known according to the text classification result of the category vector, and the feature (such as text overlapping or non-overlapping) of each image block can be known according to the text classification result of each image block.
In some embodiments, it is possible that the size of a certain text line image does not meet the size requirement of the convolution kernel of the neural network, e.g., the size of the text line image is not a multiple of the size of the convolution kernel, in which case the size of the text line image may be adjusted while maintaining the aspect ratio of the text line image before cutting the text line image into a plurality of image blocks such that the size of the adjusted text line image is an integer multiple of the size of the convolution kernel, and then padding (padding) pixels (e.g., 0) in the adjusted text line image, such that a size-adjusted text line image is obtained, where padding refers to expanding on the basis of the original text line image. In addition, a mask of the filling position, that is, a padding_mask, may be recorded so that it is possible to know at which position the filling is performed. Fig. 7 shows a schematic diagram of a text line image after filling.
In the case of filling the text line image, the encoding of the fourth vector using the transform structure described in step S45 in fig. 4 may include encoding the fourth vector using the transform structure and not performing an attention mechanism on the filled region during the encoding. By not performing an attention mechanism on the filled-in area, the difference in characteristics of text overlap and non-text overlap (i.e., normal text) is made more pronounced.
In some embodiments, the step S24 of adding the text classification result to the overlapping text line set includes adding the text line to the second candidate abnormal region set if the text classification result of the category vector corresponding to the text line image is the overlapping text, the text classification result of the text line image has more than N consecutive image blocks in the overlapping text, and the text classification confidence of the text classification result of the category vector and more than N consecutive image blocks is greater than the preset classification confidence threshold.
N is a positive integer greater than or equal to 3.
The preset classification confidence threshold may be set empirically or may be derived by self-learning. For example, the classification confidence of text lines where there is text overlap is self-learned, whereby an appropriate preset classification confidence threshold is set. For example, the preset classification confidence threshold may be set to 0.9 or other suitable value.
For example, for a certain text line image, if the text classification result of the class vector corresponding to the text line image is overlapped text, the text classification results of the image block 1, the image block 2 and the image block 3 in the text line image are continuous image blocks and the text classification results of the 3 image blocks are also overlapped text, and the text classification confidence of the class vector and the text classification results of the image block 1, the image block 2 and the image block 3 are all greater than the preset classification confidence threshold, the text line is added to the second candidate abnormal region set.
By adopting the technical scheme, when the text classification result of the category vector corresponding to the text line image is overlapped text, and the text classification result of more than N continuous image blocks in the text line image is overlapped text, and the text classification confidence of the text classification result of the category vector and more than N continuous image blocks is larger than the preset classification confidence threshold, the corresponding text line is considered to belong to the overlapped text, so that the accuracy of text overlapping detection is improved, and recall precision is improved.
Fig. 8 shows an architectural diagram of a text classifier according to an embodiment of the present disclosure. As shown in fig. 8, the text line image is first filled while maintaining the aspect ratio of the text line image, then the text line image is cut to obtain a plurality of image blocks (fig. 8 is illustrated by taking the cut to obtain 16 image blocks as an example), then the image blocks are linearly transformed in a linear projection layer, then position codes and class vectors are added, wherein "×" in fig. 8 indicates the class vectors, then the coding is performed in a Transformer encoder, and then the classification is performed in MLP to obtain a text classification result.
By adopting the architecture shown in fig. 8, it is possible to perform text classification for each text line image.
Fig. 9 shows an architectural diagram of object detection according to an embodiment of the present disclosure. The Darknet architecture is adopted as the architecture. As shown in fig. 9, three outputs are obtained after Darknet a 53 processing, the dimensions are (batch_size, 52, 52, 21), (batch_size, 26, 26, 21), (batch_size, 13, 13, 21), where 21=3 (2+4+1), 3 represents the number of anchor boxes, 2 represents the number of categories of objects within the anchor boxes, in this disclosure the object category includes two categories of overlapping text lines and overlapping regions, 4 represents the coordinate offset value (i.e. tx, ty, tw, th), 1 represents whether or not it is an overlapping object. In this architecture, there are 9 anchor boxes in total, which are equally divided among the aforementioned 3-dimensional feature layers. Thus, multi-scale detection is realized, and all the size overlapping targets can be detected. In fig. 9, the dimensions "52×52", "26×26", and "13×13" of the anchor frames and the number 3 of anchor frames are merely examples, and the present disclosure is not limited thereto.
In some embodiments, the text overlap detection method according to embodiments of the present disclosure further includes training a classifier that performs text classification and a target detector that performs target detection.
For deep learning network training, the number of training samples is very important. However, since there is very little real text overlay data in a real environment, the text overlay detection method according to the embodiments of the present disclosure further includes the step of automatically generating training samples to avoid consuming a lot of manpower to view and annotate the text overlay data.
The training samples may be automatically generated by at least one of:
(1) Writing characters on a normal text line, wherein the fonts, colors and character strings of the written characters are random;
(2) The foreground characters are extracted from the text line image, and the extracted foreground characters are superimposed on other text line images, as shown by reference numeral 4 in fig. 10, where the reference numeral 4 is originally normal text, and the superimposed text is formed by extracting foreground characters at other positions in the image and superimposing the extracted foreground characters on the reference numeral 4.
In both of the above two ways, the coordinates of the text line can be obtained by a text recognition algorithm (e.g., OCR), so that it can be determined where to write text or add foreground characters according to the coordinates of the text line.
In addition, automatically generated coordinate information of text lines with overlapping text may also be recorded for use in training the text classifier and the object detector.
After enough training samples are obtained, the classifier for performing text classification and the target detector for performing target detection can be trained by using the training samples so as to improve the classification precision of the classifier and the target detection precision of the target detector.
Fig. 11 is a schematic block diagram of a text overlap detection apparatus according to an embodiment of the present disclosure. As shown in FIG. 11, the text overlap detection device comprises a text recognition module 121 for performing text recognition on an object to be detected to obtain text recognition confidence of text lines in the object to be detected, adding text lines with the text recognition confidence lower than a preset recognition confidence threshold to a first candidate abnormal region set, a text classification module 122 for intercepting text line images of each text line of the object to be detected from the object to be detected, performing text classification on the text line images, adding text lines with text classification results being overlapped text to a second candidate abnormal region set, a target detection module 123 for performing target detection on overlapped text in the object to be detected, adding text lines with target detection results being overlapped text to a third candidate abnormal region set, and a determination module 124 for determining intersections of the first candidate abnormal region set, the second candidate abnormal region set and the third candidate abnormal region set as text overlap detection results.
By adopting the technical scheme, the first candidate abnormal region set of the object to be detected is obtained by utilizing a text recognition mode, the second candidate abnormal region set of the object to be detected is obtained by utilizing a text classification mode, the third candidate abnormal region set of the object to be detected is obtained by utilizing a target detection mode, and the text overlapping detection result is determined by utilizing the intersection of the first candidate abnormal region set, the second candidate abnormal region set and the third candidate abnormal region set. In addition, the labor cost of text overlap detection is greatly reduced.
In some embodiments, the text classification module 122 performs text classification on the text line image, including cutting the text line image into a plurality of image blocks, and performing text classification on each of the image blocks by using a text classifier that utilizes a class vector, a transformer structure, and a classification multi-layer perceptron, wherein the class vector is used to integrate whole-image features of the text line image.
In some embodiments, the text classification module 122 performs text classification on each image block by using a text classifier using a category vector, a transform structure and a two-class multi-layer perceptron, and includes performing linear transformation on a first vector composed of the plurality of image blocks to obtain a second vector, adding a learnable category vector to the second vector to obtain a third vector, adding a position code to the third vector to obtain a fourth vector, wherein the position code is used for representing the relative position relation of each image block, encoding the fourth vector by using the transform structure, and classifying the encoded fourth vector by using the two-class multi-layer perceptron to obtain text classification results of the category vector and each image block.
In some embodiments, the text classification module 122 is further configured to, prior to cutting the text line image into the plurality of image blocks, resize the text line image while maintaining the aspect ratio of the text line image, and fill pixels in the resized text line image;
the text classification module 122 is further configured to encode the fourth vector using a transducer structure and not perform an attention mechanism on the filled region during encoding.
In some embodiments, the text classification module 122 adds the text classification result to the second candidate abnormal region set for overlapping text, including adding the text line to the second candidate abnormal region set if the text classification result for the category vector corresponding to the text line image is overlapping text, the text classification result for the text line image having more than N consecutive image blocks is overlapping text, and the text classification confidence of the text classification results for the category vector and the more than N consecutive image blocks is greater than a preset classification confidence threshold.
In some embodiments, the object detection module 123 performs object detection on overlapping text in the object to be detected, including performing the object detection with overlapping text lines and overlapping areas as detection objects in the object to be detected.
In some embodiments, the object detection module 123 adds the object detection result to the third set of candidate abnormal regions for the text line of overlapping text, including adding the overlapping text line to the third set of candidate abnormal regions if the object detection result indicates that the coordinates of a certain overlapping text line overlap with the coordinates of a certain overlapping region.
In some embodiments, the text overlap detection device according to the embodiments of the present disclosure further includes a training module for automatically generating training samples by at least one of writing text on normal text lines, wherein the font, color, and character string of the written text are random, performing foreground character extraction on the text line image and superimposing the extracted foreground characters on other text line images, and training a classifier performing the text classification and a target detector performing the target detection using the training samples.
Specific implementation manners of operations performed by each module in the text overlap detection apparatus according to the embodiments of the present disclosure have been described in detail in related methods, and are not described herein.
Referring now to fig. 12, a schematic diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 12 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
As shown in fig. 12, the electronic device 600 may include a processing means (e.g., a central processing unit, a graphic processor, etc.) 601, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
In general, devices may be connected to I/O interface 605 including input devices 606, including for example, touch screens, touch pads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc., output devices 607, including for example, liquid Crystal Displays (LCDs), speakers, vibrators, etc., storage devices 608, including for example, magnetic tape, hard disk, etc., and communication devices 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 12 shows an electronic device 600 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 609, or from storage means 608, or from ROM 602. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 601.
It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to electrical wiring, fiber optic cable, RF (radio frequency), and the like, or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
The computer readable medium may be included in the electronic device or may exist alone without being incorporated into the electronic device.
The computer readable medium carries one or more programs, when the one or more programs are executed by the electronic equipment, the electronic equipment is caused to perform text recognition on an object to be detected, obtain text recognition confidence of text lines in the object to be detected, add text lines with the text recognition confidence lower than a preset recognition confidence threshold to a first candidate abnormal region set, intercept text line images of all the text lines of the object to be detected from the object to be detected, perform text classification on the text line images, add text classification results to a second candidate abnormal region set for text lines of overlapping text, perform object detection on overlapping text in the object to be detected, add text lines of object detection results to a third candidate abnormal region set for overlapping text, and determine intersections of the first candidate abnormal region set, the second candidate abnormal region set and the third candidate abnormal region set as text overlapping detection results.
Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented in software or hardware. The name of a module is not limited to the module itself in some cases, and for example, the first acquisition module may also be described as "a module that acquires at least two internet protocol addresses".
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic that may be used include Field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems-on-a-chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, example 1 provides a text overlap detection method, including performing text recognition on an object to be detected, obtaining text recognition confidence of text lines in the object to be detected, adding text lines with the text recognition confidence lower than a preset recognition confidence threshold to a first candidate abnormal region set, intercepting text line images of each text line of the object to be detected from the object to be detected, performing text classification on the text line images, adding text classification results to a second candidate abnormal region set as text overlap detection results, performing object detection on overlapped text in the object to be detected, adding text lines of object detection results to a third candidate abnormal region set as text overlap detection results, and determining intersections of the first candidate abnormal region set, the second candidate abnormal region set and the third candidate abnormal region set as text overlap detection results.
Example 2 provides the method of example 1 according to one or more embodiments of the present disclosure, wherein the text line image is text classified, comprising cutting the text line image into a plurality of image blocks, and text classifying each of the image blocks using a text classifier utilizing a class vector, a transducer structure, and a bi-classified multi-layer perceptron, wherein the class vector is used to integrate whole-image features of the text line image.
According to one or more embodiments of the present disclosure, example 3 provides the method of example 2, wherein the employing a text classifier using a class vector, a transform structure, and a hierarchical binary perceptron to perform text classification on each of the image blocks includes performing a linear transformation on a first vector composed of the plurality of image blocks to obtain a second vector, adding a learnable class vector to the second vector to obtain a third vector, adding a position code to the third vector to obtain a fourth vector, wherein the position code is used to characterize a relative positional relationship of each of the image blocks, encoding the fourth vector using the transform structure, and classifying the encoded fourth vector using the hierarchical binary perceptron to obtain the class vector and a text classification result of each of the image blocks.
In accordance with one or more embodiments of the present disclosure, example 4 provides the method of example 3, wherein, prior to the cutting the text line image into the plurality of image blocks, the method further comprises adjusting a size of the text line image while maintaining an aspect ratio of the text line image, and filling pixels in the adjusted text line image;
The encoding of the fourth vector using a transform structure includes encoding the fourth vector using a transform structure and not performing an attention mechanism on the padding region during encoding.
According to one or more embodiments of the present disclosure, example 5 provides the method of example 3 or 4, wherein the adding the text classification result to the second candidate abnormal region set for the overlapped text includes adding the text line to the second candidate abnormal region set if the text classification result of the category vector corresponding to the text line image is the overlapped text, the text classification result of the image block having more than N consecutive image blocks in the text line image is the overlapped text, and the text classification confidence of the text classification results of the category vector and the image block having more than N consecutive image blocks is greater than a preset classification confidence threshold.
According to one or more embodiments of the present disclosure, example 6 provides the method of example 1, wherein the performing object detection on overlapping text in the object to be detected includes performing the object detection with overlapping text lines and overlapping areas as detection objects in the object to be detected.
Example 7 provides the method of example 6, wherein adding the target detection result to the third set of candidate abnormal regions for the text line of overlapping text comprises adding the overlapping text line to the third set of candidate abnormal regions if the target detection result indicates that the coordinates of a certain overlapping text line overlap with the coordinates of a certain overlapping region.
Example 8 provides the method of example 1, according to one or more embodiments of the present disclosure, wherein the method further comprises automatically generating training samples by at least one of writing text on normal text lines, wherein the font, color, and string of written text are random, performing foreground character extraction on the text line image and superimposing the extracted foreground characters on other text line images, and training a classifier performing the text classification and a target detector performing the target detection with the training samples.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).
Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims. The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Claims (11)

1.一种文本重叠检测方法,其特征在于,包括:1. A text overlap detection method, characterized in that it includes: 对待检测对象进行文字识别,得到所述待检测对象中文本行的文字识别置信度,将所述文字识别置信度低于预设识别置信度阈值的文本行添加到第一候选异常区域集中;The text recognition of the object to be detected is performed to obtain the text recognition confidence score of the text lines in the object to be detected. The text lines with the text recognition confidence score lower than the preset recognition confidence threshold are added to the first candidate abnormal region set. 从所述待检测对象中截取所述待检测对象的各个文本行的文本行图像,对所述文本行图像进行文本分类,将文本分类结果为重叠文本的文本行添加到第二候选异常区域集中;Text line images of each text line of the object to be detected are extracted from the object to be detected. Text line images are classified. Text lines whose text classification results are overlapping text are added to the second candidate abnormal region set. 对所述待检测对象中的重叠文本进行目标检测,将目标检测结果为重叠文本的文本行添加到第三候选异常区域集中;Target detection is performed on the overlapping text in the object to be detected, and the text lines whose target detection results are overlapping text are added to the third candidate abnormal region set; 将所述第一候选异常区域集、所述第二候选异常区域集和所述第三候选异常区域集的交集确定为文本重叠检测结果。The intersection of the first set of candidate abnormal regions, the second set of candidate abnormal regions, and the third set of candidate abnormal regions is determined as the text overlap detection result. 2.根据权利要求1所述的方法,其特征在于,所述对所述文本行图像进行文本分类,包括:2. The method according to claim 1, wherein the text classification of the text line image includes: 将所述文本行图像切割成多个图像块;The text line image is cut into multiple image blocks; 采用利用了类别向量、transformer结构和二分类多层感知器的文本分类器,对每个所述图像块进行文本分类,其中,所述类别向量用于整合所述文本行图像的整图特征。A text classifier that utilizes category vectors, a transformer structure, and a binary multilayer perceptron is used to classify the text for each image patch, wherein the category vectors are used to integrate the overall image features of the text line image. 3.根据权利要求2所述的方法,其特征在于,所述采用利用了类别向量、transformer结构和二分类多层感知器的文本分类器,对每个所述图像块进行文本分类,包括:3. The method according to claim 2, characterized in that, the step of using a text classifier that utilizes category vectors, a transformer structure, and a binary multilayer perceptron to perform text classification on each image patch includes: 对由所述多个图像块组成的第一向量进行线性变换,得到第二向量;A second vector is obtained by performing a linear transformation on the first vector composed of the plurality of image blocks; 在所述第二向量中添加可学习的所述类别向量,得到第三向量;Add the learnable category vector to the second vector to obtain the third vector; 在所述第三向量中添加位置编码,得到第四向量,其中所述位置编码用于表征各个所述图像块的相对位置关系;A positional code is added to the third vector to obtain a fourth vector, wherein the positional code is used to characterize the relative positional relationship of each of the image blocks; 使用所述transformer结构对所述第四向量进行编码;The fourth vector is encoded using the transformer structure. 利用所述二分类多层感知器对编码后的第四向量进行分类,得到所述类别向量和各个所述图像块的文本分类结果。The encoded fourth vector is classified using the binary classification multilayer perceptron to obtain the category vector and the text classification results of each image patch. 4.根据权利要求3所述的方法,其特征在于,在所述将所述文本行图像切割成多个图像块之前,所述方法还包括:在保持所述文本行图像的长宽比的情况下调整所述文本行图像的尺寸,并在调整后的文本行图像中填充像素;4. The method according to claim 3, characterized in that, before cutting the text line image into multiple image blocks, the method further includes: adjusting the size of the text line image while maintaining the aspect ratio of the text line image, and filling the adjusted text line image with pixels; 所述使用所述transformer结构对所述第四向量进行编码,包括:使用所述transformer结构对所述第四向量进行编码,以及在编码过程中不对填充区域执行注意力机制。The step of encoding the fourth vector using the transformer structure includes: encoding the fourth vector using the transformer structure, and not performing an attention mechanism on the filled region during the encoding process. 5.根据权利要求2至4中任一项所述的方法,其特征在于,所述将文本分类结果为重叠文本的文本行添加到第二候选异常区域集中,包括:5. The method according to any one of claims 2 to 4, characterized in that adding the text lines whose text classification result is overlapping text to the second candidate anomaly region set includes: 若所述文本行图像所对应的类别向量的文本分类结果为重叠文本、所述文本行图像中有连续N个以上的图像块的文本分类结果为重叠文本、而且所述类别向量和所述连续N个以上的图像块的文本分类结果的文本分类置信度均大于预设分类置信度阈值,则将该文本行添加到所述第二候选异常区域集中。If the text classification result of the category vector corresponding to the text line image is overlapping text, the text classification result of more than N consecutive image blocks in the text line image is overlapping text, and the text classification confidence of the category vector and the text classification result of more than N consecutive image blocks is greater than the preset classification confidence threshold, then the text line is added to the second candidate abnormal region set. 6.根据权利要求1所述的方法,其特征在于,所述对所述待检测对象中的重叠文本进行目标检测,包括:6. The method according to claim 1, wherein the target detection of overlapping text in the object to be detected includes: 将重叠文本行和重叠区域作为所述待检测对象中的检测目标,来执行所述目标检测。The overlapping text lines and overlapping regions are used as detection targets in the object to be detected to perform the target detection. 7.根据权利要求6所述的方法,其特征在于,所述将目标检测结果为重叠文本的文本行添加到第三候选异常区域集中,包括:7. The method according to claim 6, wherein adding the text lines whose target detection result is overlapping text to the third candidate anomaly region set comprises: 若目标检测结果指示某一重叠文本行的坐标与某一重叠区域的坐标有重叠,则将该重叠文本行添加到所述第三候选异常区域集中。If the target detection result indicates that the coordinates of a certain overlapping text line overlap with the coordinates of a certain overlapping region, then the overlapping text line is added to the third candidate abnormal region set. 8.根据权利要求1所述的方法,其特征在于,所述方法还包括:8. The method according to claim 1, characterized in that the method further comprises: 通过以下至少一种方式自动生成训练样本:在正常文本行上写入文字,其中被写入文字的字体、颜色和字符串均是随机的;对所述文本行图像进行前景字符提取,并将提取到的前景字符叠加到其他文本行图像上;Training samples are automatically generated using at least one of the following methods: writing text onto normal text lines, wherein the font, color, and string of the written text are all random; extracting foreground characters from the text line images and overlaying the extracted foreground characters onto other text line images; 利用所述训练样本对执行所述文本分类的分类器和执行所述目标检测的目标检测器进行训练。The training samples are used to train the classifier that performs the text classification and the target detector that performs the target detection. 9.一种文本重叠检测装置,其特征在于,包括:9. A text overlap detection device, characterized in that it comprises: 文字识别模块,用于对待检测对象进行文字识别,得到所述待检测对象中文本行的文字识别置信度,将所述文字识别置信度低于预设识别置信度阈值的文本行添加到第一候选异常区域集中;The text recognition module is used to perform text recognition on the object to be detected, obtain the text recognition confidence score of the text lines in the object to be detected, and add the text lines with the text recognition confidence score lower than the preset recognition confidence threshold to the first candidate abnormal region set. 文本分类模块,用于从所述待检测对象中截取所述待检测对象的各个文本行的文本行图像,对所述文本行图像进行文本分类,将文本分类结果为重叠文本的文本行添加到第二候选异常区域集中;The text classification module is used to extract text line images of each text line of the object to be detected from the object to be detected, perform text classification on the text line images, and add the text lines whose text classification results are overlapping text to the second candidate abnormal region set. 目标检测模块,用于对所述待检测对象中的重叠文本进行目标检测,将目标检测结果为重叠文本的文本行添加到第三候选异常区域集中;The target detection module is used to perform target detection on overlapping text in the object to be detected, and add the text lines whose target detection results are overlapping text to the third candidate abnormal region set. 确定模块,用于将所述第一候选异常区域集、所述第二候选异常区域集和所述第三候选异常区域集的交集确定为文本重叠检测结果。The determination module is used to determine the intersection of the first candidate abnormal region set, the second candidate abnormal region set, and the third candidate abnormal region set as the text overlap detection result. 10.一种计算机可读介质,其上存储有计算机程序,其特征在于,该程序被处理装置执行时实现权利要求1-8中任一项所述方法的步骤。10. A computer-readable medium having a computer program stored thereon, characterized in that, when executed by a processing device, the program implements the steps of the method according to any one of claims 1-8. 11.一种电子设备,其特征在于,包括:11. An electronic device, characterized in that it comprises: 存储装置,其上存储有计算机程序;A storage device on which computer programs are stored; 处理装置,用于执行所述存储装置中的所述计算机程序,以实现权利要求1-8中任一项所述方法的步骤。A processing device for executing the computer program in the storage device to implement the steps of the method according to any one of claims 1-8.
CN202211678556.7A 2022-12-26 2022-12-26 Text overlap detection method, device, medium and electronic equipment Active CN115937864B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211678556.7A CN115937864B (en) 2022-12-26 2022-12-26 Text overlap detection method, device, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211678556.7A CN115937864B (en) 2022-12-26 2022-12-26 Text overlap detection method, device, medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN115937864A CN115937864A (en) 2023-04-07
CN115937864B true CN115937864B (en) 2025-11-21

Family

ID=86552068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211678556.7A Active CN115937864B (en) 2022-12-26 2022-12-26 Text overlap detection method, device, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN115937864B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117496274B (en) * 2023-12-29 2024-06-11 墨卓生物科技(浙江)有限公司 Classification counting method, system and storage medium based on liquid drop images
CN119152172A (en) * 2024-09-27 2024-12-17 武汉新烽光电股份有限公司 Method, device, equipment and storage medium for positioning termite nest of dam
CN119360393B (en) * 2024-09-30 2025-09-09 上海哔哩哔哩科技有限公司 Character recognition method, device, equipment and computer medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113034421A (en) * 2019-12-06 2021-06-25 腾讯科技(深圳)有限公司 Image detection method, device and storage medium
CN114255419A (en) * 2021-12-03 2022-03-29 科大讯飞股份有限公司 Text recognition method, device, electronic device and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4275973B2 (en) * 2003-03-20 2009-06-10 株式会社リコー Retouched image extraction apparatus, program, storage medium, and retouched image extraction method
CN113821652B (en) * 2021-01-21 2025-12-16 北京沃东天骏信息技术有限公司 Model data processing method, device, electronic equipment and computer readable medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113034421A (en) * 2019-12-06 2021-06-25 腾讯科技(深圳)有限公司 Image detection method, device and storage medium
CN114255419A (en) * 2021-12-03 2022-03-29 科大讯飞股份有限公司 Text recognition method, device, electronic device and storage medium

Also Published As

Publication number Publication date
CN115937864A (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN115937864B (en) Text overlap detection method, device, medium and electronic equipment
CN114037990B (en) A character recognition method, device, equipment, medium and product
CN111310770B (en) Target detection method and device
CN113222983A (en) Image processing method, image processing device, readable medium and electronic equipment
CN116129452B (en) Method, application method, device, equipment and medium for generating document understanding model
CN114581336B (en) Image restoration method, device, equipment, medium and product
CN110852258A (en) Object detection method, device, equipment and storage medium
CN114067327B (en) Text recognition method, device, readable medium and electronic device
CN114973271A (en) Text information extraction method, extraction system, electronic device and storage medium
CN114463768A (en) Form recognition method and device, readable medium and electronic equipment
CN111797822B (en) Text object evaluation method, device and electronic device
KR102510881B1 (en) Method and apparatus for learning key point of based neural network
CN114429628B (en) Image processing methods, apparatuses, readable storage media, and electronic devices
CN112418249B (en) Mask image generation method, device, electronic device and computer readable medium
CN111310595B (en) Method and apparatus for generating information
CN110287350B (en) Image retrieval method, device and electronic equipment
CN114120364B (en) Image processing method, image classification method, device, medium and electronic equipment
CN111753836B (en) Text recognition method, device, computer readable medium and electronic device
CN120279405A (en) Method and device for segmenting farmland remote sensing images
CN113191251A (en) Method and device for detecting stroke order, electronic equipment and storage medium
CN110852242A (en) Watermark identification method, device, equipment and storage medium based on multi-scale network
CN114004229B (en) Text recognition method, device, readable medium and electronic device
CN110807784B (en) Method and device for segmenting an object
CN120374627B (en) Metal impurity identification method, device, equipment and medium based on artificial intelligence
CN116563701B (en) Target object detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant