CN115953789A

CN115953789A - Adaptive text detection method, device, equipment and medium

Info

Publication number: CN115953789A
Application number: CN202211618750.6A
Authority: CN
Inventors: 康家杰; 秦传波; 麦超云
Original assignee: Wuyi University Fujian
Current assignee: Wuyi University Fujian
Priority date: 2022-12-15
Filing date: 2022-12-15
Publication date: 2023-04-11

Abstract

The embodiment of the present application provides an adaptive text detection method, device, equipment, and medium. The method includes inputting a text image into a text detection network; extracting feature information of a text area from a text image through a text detection network and obtaining a feature image; The image detects the boundary of the text area, and multiple corner points are used to locate the boundary to generate the area candidate frame of the text area; the adaptive threshold map is obtained from the feature image, and the adaptive threshold map is segmented to segment different texts belonging to the same text area To the same segmented area, the segmentation map is obtained; the text detection result is obtained by combining the area candidate map and the segmentation map; it can recognize and detect the image text of arbitrary shape and dense area, and improve the robustness of text detection.

Description

Adaptive text detection method, device, equipment and medium

技术领域technical field

本申请实施例涉及但不限于图像处理技术领域，尤其涉及自适应文字检测方法、装置、设备及介质。The embodiments of the present application relate to, but are not limited to, the technical field of image processing, and in particular, relate to an adaptive text detection method, device, equipment, and medium.

背景技术Background technique

文字检测任务是确定图像的文字位置并识别出图像的文字表达。在自然场景图像下，文字具有多样性，文字检测任务的结果受文字的大小、字体、方向、尺度、形状影响，同时受复杂的背景和干扰，如亮度不均匀、模糊、低分辨率等因素影响；这些因素均会导致文字检测任务的结果不够准确。The text detection task is to determine the text position of the image and recognize the text expression of the image. In natural scene images, text is diverse, and the results of text detection tasks are affected by the size, font, orientation, scale, and shape of the text, as well as complex backgrounds and interference, such as uneven brightness, blur, low resolution, etc. Influence; these factors will lead to inaccurate results of the text detection task.

对于目前的文字检测任务，通过给定一个旋转矩形边框，其包含有4个固定角点(左上角、右上角、右下角、左下角)，用顺时针方向上的二维坐标来表示，这并不适合任意形状文字的检测；另外，当文字区域非常接近时，对密集的文字区域的分割容易失败，只能预测其之一，导致漏检的情况。面对较为密集的文字区域和文字弯曲尺度变化大的场景图像，目前的文字检测方法的鲁棒性难以得到保证。For the current text detection task, by giving a rotating rectangular frame, which contains 4 fixed corner points (upper left corner, upper right corner, lower right corner, lower left corner), represented by two-dimensional coordinates in the clockwise direction, this It is not suitable for the detection of arbitrary-shaped text; in addition, when the text areas are very close, the segmentation of dense text areas tends to fail, and only one of them can be predicted, resulting in missed detection. In the face of relatively dense text areas and scene images with large changes in text bending scales, the robustness of current text detection methods is difficult to guarantee.

发明内容Contents of the invention

以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics described in detail in this article. This summary is not intended to limit the scope of the claims.

本申请实施例旨在至少解决现有技术中存在的技术问题之一，本申请实施例提供了自适应文字检测方法、装置、设备及介质，能够对任意形状和密集区域的图像文字进行识别检测，提高文字检测的鲁棒性。The embodiment of the present application aims to solve at least one of the technical problems existing in the prior art. The embodiment of the present application provides an adaptive character detection method, device, equipment and medium, which can recognize and detect image characters in arbitrary shapes and dense areas , to improve the robustness of text detection.

本申请的第一方面的实施例，一种自适应文字检测方法，包括：In an embodiment of the first aspect of the present application, an adaptive text detection method includes:

获取待检测的文字图像，将所述文字图像输入至文字检测网络；Obtain a text image to be detected, and input the text image to a text detection network;

由所述文字图像提取出文字区域的特征信息，根据所述特征信息得到特征图像；Extracting feature information of the text area from the text image, and obtaining a feature image according to the feature information;

对所述特征图像进行边界检测处理以得到所述文字区域的边界，通过多个角点对所述文字区域的边界进行定位，生成所述文字区域的区域候选框，得到区域候选图，所述角点的数量与所述文字区域的比例对应；Perform boundary detection processing on the feature image to obtain the boundary of the text region, locate the boundary of the text region through a plurality of corner points, generate a region candidate frame for the character region, and obtain a region candidate map, the The number of corner points corresponds to the proportion of the text area;

对所述特征图像进行自适应阈值化处理以得到自适应阈值图，对所述自适应阈值图进行图像分割处理，将属于同一文字区域的不同文字分割到同一分割区域，得到分割图；Carrying out adaptive thresholding processing on the feature image to obtain an adaptive threshold map, performing image segmentation processing on the adaptive threshold map, and segmenting different characters belonging to the same text area into the same segmented area to obtain a segmentation map;

联合所述区域候选图和所述分割图进行文字检测，得到文字检测结果。Combining the region candidate map and the segmentation map to perform text detection to obtain a text detection result.

本申请的第一方面的某些实施例，所述由所述文字图像提取出文字区域的特征信息，根据所述特征信息得到特征图像，包括：In some embodiments of the first aspect of the present application, the feature information of the text area is extracted from the text image, and the feature image is obtained according to the feature information, including:

由所述文字检测网络的主干网络的多个残差块根据所述文字图像生成多个包含文字区域的特征信息的基础特征映射；A plurality of residual blocks of the backbone network of the text detection network generate a plurality of basic feature maps containing feature information of text regions according to the text image;

由所述文字检测网络的特征金字塔网络根据多个所述基础特征映射生成多个不同尺度的融合特征图；Generate a plurality of fusion feature maps of different scales according to a plurality of the basic feature maps by the feature pyramid network of the text detection network;

对多个所述融合特征图进行连接操作和上采样操作得到特征图像。A feature image is obtained by performing a concatenation operation and an upsampling operation on the plurality of fused feature maps.

本申请的第一方面的某些实施例，所述融合特征图包括第一特征图、第二特征图、第三特征图、第四特征图和第五特征图；所述第一特征图、所述第二特征图、所述第三特征图、所述第四特征图和所述第五特征图的大小依次减小；所述第一特征图、所述第二特征图、所述第三特征图、所述第四特征图和所述第五特征图的通道数相同。In some embodiments of the first aspect of the present application, the fusion feature map includes a first feature map, a second feature map, a third feature map, a fourth feature map, and a fifth feature map; the first feature map, The sizes of the second feature map, the third feature map, the fourth feature map, and the fifth feature map decrease sequentially; the first feature map, the second feature map, the first feature map The number of channels of the three feature maps, the fourth feature map and the fifth feature map is the same.

本申请的第一方面的某些实施例，所述对多个所述融合特征图进行连接操作和上采样操作得到特征图像，包括：In some embodiments of the first aspect of the present application, the performing a connection operation and an upsampling operation on a plurality of the fusion feature maps to obtain a feature image includes:

将所述第一特征图、对所述第二特征图进行上采样操作的结果、对所述第三特征图进行上采样操作的结果、对所述第四特征图进行上采样操作的结果、对所述第五特征图进行上采样操作的结果进行连接操作，得到特征图像。The first feature map, the result of the upsampling operation on the second feature map, the result of the upsampling operation on the third feature map, the result of the upsampling operation on the fourth feature map, and performing a concatenation operation on the results of the up-sampling operation on the fifth feature map to obtain a feature image.

本申请的第一方面的某些实施例，所述对所述特征图像进行边界检测处理以得到所述文字区域的边界，通过多个角点对所述文字区域的边界进行定位以生成所述文字区域的区域候选框，包括：In some embodiments of the first aspect of the present application, the boundary detection processing is performed on the feature image to obtain the boundary of the text area, and the boundary of the text area is located through a plurality of corner points to generate the Area candidate boxes for text areas, including:

对所述特征图像进行边界检测处理，通过初始的角点表示特征图像中的文字区域的边界；Perform boundary detection processing on the feature image, and represent the boundary of the character area in the feature image through the initial corner point;

计算不同角点的偏移量，根据所述偏移量进行弯曲度预测，得到所述文字区域的弯曲度；calculating offsets of different corner points, performing curvature prediction according to the offsets, and obtaining the curvature of the text area;

根据所述弯曲度增加或减少角点以定位表示特征图像中的文字区域的边界，进而通过多个角点对所述文字区域的边界进行定位以生成所述文字区域的区域候选框。The corner points are increased or decreased according to the curvature to locate the boundaries representing the text regions in the feature image, and then the borders of the text regions are positioned through the plurality of corner points to generate region candidate frames for the text regions.

本申请的第一方面的某些实施例，所述对所述特征图像进行自适应阈值化处理以得到自适应阈值图，对所述自适应阈值图进行图像分割处理，将属于同一文字区域的不同文字分割到同一分割区域，得到分割图，包括：In some embodiments of the first aspect of the present application, the adaptive thresholding process is performed on the feature image to obtain an adaptive threshold value map, and the image segmentation process is performed on the adaptive threshold value map, and the images belonging to the same text area Different characters are segmented into the same segmented area to obtain a segmented graph, including:

对所述特征图像计算像素值，根据所述像素值进行自适应阈值化处理，得到自适应阈值图；calculating pixel values for the feature image, and performing adaptive thresholding processing according to the pixel values to obtain an adaptive threshold map;

根据所述像素值计算文字之间的特征距离，根据所述特征距离将属于同一文字区域的不同文字分割到同一分割区域，以对所述自适应阈值图进行图像分割处理，得到分割图。calculating a feature distance between characters according to the pixel values, and segmenting different characters belonging to the same character area into the same segmented area according to the feature distance, so as to perform image segmentation processing on the adaptive threshold map to obtain a segmented map.

本申请的第一方面的某些实施例，所述自适应阈值图的文字区域的不同位置的阈值是不同的，并且所述文字区域的边界的阈值小于所述文字区域的中心的阈值。In some embodiments of the first aspect of the present application, the thresholds at different positions of the text area in the adaptive threshold map are different, and the threshold at the border of the text area is smaller than the threshold at the center of the text area.

本申请的第二方面的实施例，一种自适应文字检测装置，包括：In an embodiment of the second aspect of the present application, an adaptive text detection device includes:

输入模块，用于获取待检测的文字图像，将所述文字图像输入至文字检测网络；The input module is used to obtain the text image to be detected, and input the text image to the text detection network;

特征提取模块，用于由所述文字图像提取出文字区域的特征信息，根据所述特征信息得到特征图像；A feature extraction module, configured to extract feature information of a text region from the text image, and obtain a feature image according to the feature information;

自适应角点检测模块，用于对所述特征图像进行边界检测处理以得到所述文字区域的边界，通过多个角点对所述文字区域的边界进行定位以生成所述文字区域的区域候选框，所述角点的数量与所述文字区域的比例对应；An adaptive corner detection module, configured to perform boundary detection processing on the feature image to obtain the boundary of the text area, and locate the boundary of the text area through a plurality of corner points to generate an area candidate for the text area frame, the number of corner points corresponds to the proportion of the text area;

自适应阈值分割模块，用于对所述特征图像进行自适应阈值化处理以得到自适应阈值图，对所述自适应阈值图进行图像分割处理，将属于同一文字区域的不同文字分割到同一分割区域，得到分割图；The adaptive threshold segmentation module is used to perform adaptive threshold processing on the feature image to obtain an adaptive threshold map, perform image segmentation processing on the adaptive threshold map, and segment different characters belonging to the same text area into the same segment region, get the segmentation map;

输出模块，用于联合所述区域候选图和所述分割图进行文字检测，得到文字检测结果。An output module, configured to combine the region candidate map and the segmentation map to perform text detection to obtain a text detection result.

本申请的第三方面的实施例，一种电子设备，包括：存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如上所述的自适应文字检测方法。According to the embodiment of the third aspect of the present application, an electronic device includes: a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the computer program, the above-mentioned The adaptive text detection method described above.

本申请的第四方面的实施例，一种计算机可读存储介质，所述一种计算机可读存储介质，其特征在于，所述计算机可读存储介质存储有一个或者多个程序，所述一个或者多个程序可被一个或者多个处理器运行，以实现如上所述的自适应文字检测方法。In an embodiment of the fourth aspect of the present application, a computer-readable storage medium, the computer-readable storage medium is characterized in that the computer-readable storage medium stores one or more programs, and the one Or multiple programs can be run by one or more processors to implement the above-mentioned adaptive text detection method.

上述方案至少具有以下的有益效果：结合基于回归和分割方法两种思想，提出了自适应角点检测方法和自适应阈值分割方法，将不同尺度的文字生成不同数量的角点进行定位，利用自适应阈值的方法生成文字分割图，结合生成的角点候选框进行联合优化，得到可视化的文字检测结果；能够对任意形状和密集区域的图像文字进行识别检测，提高文字检测的鲁棒性。The above scheme has at least the following beneficial effects: Combining the two ideas based on regression and segmentation methods, an adaptive corner detection method and an adaptive threshold segmentation method are proposed to generate different numbers of corner points for text of different scales for positioning. The text segmentation map is generated by adapting the threshold method, and combined with the generated corner candidate boxes for joint optimization to obtain visual text detection results; it can recognize and detect image text in arbitrary shapes and dense areas, and improve the robustness of text detection.

附图说明Description of drawings

附图用来提供对本申请技术方案的进一步理解，并且构成说明书的一部分，与本申请的实施例一起用于解释本申请的技术方案，并不构成对本申请技术方案的限制。The accompanying drawings are used to provide a further understanding of the technical solution of the present application, and constitute a part of the specification, and are used together with the embodiments of the present application to explain the technical solution of the present application, and do not constitute a limitation to the technical solution of the present application.

图1是本申请的实施例所提供的自适应文字检测方法的步骤图；Fig. 1 is a step diagram of the adaptive text detection method provided by the embodiment of the present application;

图2是步骤S200的子步骤图；Fig. 2 is a substep diagram of step S200;

图3是步骤S300的子步骤图；Fig. 3 is a sub-step diagram of step S300;

图4是步骤S400的子步骤图；Fig. 4 is a sub-step diagram of step S400;

图5是本申请的实施例所提供的自适应文字检测装置的结构图；FIG. 5 is a structural diagram of an adaptive text detection device provided by an embodiment of the present application;

图6是本申请的实施例所提供的电子设备的结构图。Fig. 6 is a structural diagram of an electronic device provided by an embodiment of the present application.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.

需要说明的是，虽然在装置示意图中进行了功能模块划分，在流程图中示出了逻辑顺序，但是在某些情况下，可以以不同于装置中的模块划分，或流程图中的顺序执行所示出或描述的步骤。说明书、权利要求书或上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。It should be noted that although the functional modules are divided in the schematic diagram of the device, and the logical sequence is shown in the flowchart, in some cases, it can be executed in a different order than the module division in the device or the flowchart in the flowchart. steps shown or described. The terms "first", "second" and the like in the specification, claims or the above drawings are used to distinguish similar objects, and not necessarily used to describe a specific order or sequence.

下面结合附图，对本申请实施例作进一步阐述。The embodiments of the present application will be further described below in conjunction with the accompanying drawings.

本申请的实施例，提供了一种自适应文字检测方法。Embodiments of the present application provide an adaptive text detection method.

参照图1，自适应文字检测方法包括：Referring to Figure 1, the adaptive text detection method includes:

步骤S100，获取待检测的文字图像，将文字图像输入至文字检测网络；Step S100, acquiring the text image to be detected, and inputting the text image into the text detection network;

步骤S200，由文字图像提取出文字区域的特征信息，根据特征信息得到特征图像；Step S200, extracting feature information of the text area from the text image, and obtaining a feature image according to the feature information;

步骤S300，对特征图像进行边界检测处理以得到文字区域的边界，通过多个角点对文字区域的边界进行定位以生成文字区域的区域候选框；Step S300, performing boundary detection processing on the feature image to obtain the boundary of the text area, and locating the boundary of the text area through multiple corner points to generate an area candidate box for the text area;

步骤S400，对特征图像进行自适应阈值化处理以得到自适应阈值图，对自适应阈值图进行图像分割处理，将属于同一文字区域的不同文字分割到同一分割区域，得到分割图；Step S400, performing adaptive thresholding processing on the feature image to obtain an adaptive threshold map, performing image segmentation processing on the adaptive threshold map, and segmenting different characters belonging to the same text area into the same segmented area to obtain a segmented map;

步骤S500，联合区域候选图和分割图进行文字检测，得到文字检测结果。Step S500, combine the region candidate map and the segmentation map to perform text detection, and obtain a text detection result.

对于步骤S100，获取待检测的文字图像，其中文字图像为包含有待检测的文字的图像。图像可以是通过照相机等摄影设备拍摄的图片，也可以是由视频分帧得到的图片。For step S100, the text image to be detected is acquired, wherein the text image is an image containing the text to be detected. The image can be a picture taken by a photographic device such as a camera, or a picture obtained by dividing a video into frames.

将文字图像输入至已经训练完成的文字检测网络中。可以理解的是，输入大量的待训练的文字图像至未训练完成的文字检测网络，对文字检测网络进行训练，通过损失函数对文字检测网络进行调整参数，直至文字检测网络收敛，则文字检测网络训练完成。Input the text image into the trained text detection network. It is understandable that a large number of text images to be trained are input to the untrained text detection network, the text detection network is trained, and the parameters of the text detection network are adjusted through the loss function until the text detection network converges, then the text detection network Training is complete.

参照图2，对于步骤S200，通过文字检测网络由文字图像提取出文字区域的特征信息，根据特征信息得到特征图像，包括但不限于以下步骤：Referring to Fig. 2, for step S200, the feature information of the text area is extracted from the text image through the text detection network, and the feature image is obtained according to the feature information, including but not limited to the following steps:

步骤S210，由文字检测网络的主干网络的多个残差块根据文字图像生成多个包含文字区域的特征信息的基础特征映射；Step S210, a plurality of residual blocks of the backbone network of the text detection network generate a plurality of basic feature maps containing feature information of the text area according to the text image;

步骤S220，由文字检测网络的特征金字塔网络根据多个基础特征映射生成多个不同尺度的融合特征图；In step S220, the feature pyramid network of the text detection network generates a plurality of fusion feature maps of different scales according to a plurality of basic feature maps;

步骤S230，对多个融合特征图进行连接操作和上采样操作得到特征图像。Step S230, performing a concatenation operation and an upsampling operation on a plurality of fused feature maps to obtain a feature image.

对于步骤S210，主干网络为ResNet-50，主干网络由多个残差块组成。由ResNet-50的多个残差块根据文字图像生成多个包含文字区域的特征信息的基础特征映射，具体可以选取C3、C4和C5作为基础特征映射，对应步长分别为8、16和32。For step S210, the backbone network is ResNet-50, and the backbone network is composed of multiple residual blocks. Multiple residual blocks of ResNet-50 generate multiple basic feature maps containing feature information of the text area according to the text image. Specifically, C3, C4 and C5 can be selected as the basic feature maps, and the corresponding step sizes are 8, 16 and 32 respectively. .

具体地，Resnet-50网络中包含了49个卷积层、一个全连接层；其网络结构可以分成七个部分，第一部分不包含残差块，主要对输入进行卷积、正则化、激活函数、最大池化的计算。第二、三、四、五部分结构都包含了残差块，并且有不改变残差块尺寸而只改变残差块维度的结构。在Resnet-50网络中，残差块有三层卷积。Resnet-50网络的输入为224×224×3，经过前五部分的卷积计算，输出为7×7×2048，池化层会将其转化成一个特征向量，全连接层会对特征向量进行计算并输出类别概率。Specifically, the Resnet-50 network contains 49 convolutional layers and a fully connected layer; its network structure can be divided into seven parts, the first part does not contain residual blocks, and mainly performs convolution, regularization, and activation functions on the input. , The calculation of the maximum pooling. The second, third, fourth, and fifth parts of the structure all include residual blocks, and there are structures that do not change the size of the residual block but only change the dimension of the residual block. In the Resnet-50 network, the residual block has three layers of convolutions. The input of the Resnet-50 network is 224×224×3. After the convolution calculation of the first five parts, the output is 7×7×2048. The pooling layer will convert it into a feature vector, and the fully connected layer will process the feature vector. Computes and outputs class probabilities.

对于步骤S220，由文字检测网络的特征金字塔网络根据多个基础特征映射生成多个不同尺度的融合特征图。For step S220, the feature pyramid network of the text detection network generates multiple fusion feature maps of different scales according to multiple basic feature maps.

其中，融合特征图包括第一特征图、第二特征图、第三特征图、第四特征图和第五特征图，分别记为P3、P4、P5、P6、P7。第一特征图、第二特征图、第三特征图、第四特征图和第五特征图的大小依次减小；第一特征图、第二特征图、第三特征图、第四特征图和第五特征图的通道数相同。Wherein, the fused feature map includes a first feature map, a second feature map, a third feature map, a fourth feature map and a fifth feature map, respectively denoted as P3, P4, P5, P6, and P7. The size of the first feature map, the second feature map, the third feature map, the fourth feature map and the fifth feature map decrease in turn; the first feature map, the second feature map, the third feature map, the fourth feature map and The number of channels of the fifth feature map is the same.

具体地，第一特征图的大小为输入图像的1/8，第二特征图的大小为输入图像的1/16，第三特征图的大小为输入图像的1/32，第四特征图的大小为输入图像的1/64，第四特征图的大小为输入图像的1/128，第一特征图、第二特征图、第三特征图、第四特征图和第五特征图的通道数统一设置为256。Specifically, the size of the first feature map is 1/8 of the input image, the size of the second feature map is 1/16 of the input image, the size of the third feature map is 1/32 of the input image, and the size of the fourth feature map is The size is 1/64 of the input image, the size of the fourth feature map is 1/128 of the input image, the number of channels of the first feature map, the second feature map, the third feature map, the fourth feature map and the fifth feature map Unity is set to 256.

特征金字塔网络把高层的特征传下来，补充低层的语义，这样就可以获得高分辨率、强语义的特征，有利于小目标的检测。特征金字塔网络包括自底向上、自顶向下和横向连接；横向连接的两层特征在空间尺寸上要相同，主要是为了利用底层的定位细节信息，这是由于底部的金字塔层提取的特征图包含更多的定位细节，而顶部的金字塔层提取的特征图包含更多的目标特征信息。能够使用不同层次的金字塔层提取的特征图进行预测，从网络不同层抽取不同尺寸的特征做预测，没有增加额外的计算量。上半部分只取特征金字塔的最底层进行预测，下半部分对特征金字塔的所有层进行单独预测，最后再整合所有预测结果。The feature pyramid network passes down the high-level features and supplements the low-level semantics, so that high-resolution and strong semantic features can be obtained, which is conducive to the detection of small targets. The feature pyramid network includes bottom-up, top-down, and horizontal connections; the two-layer features of the horizontal connection should be the same in spatial size, mainly to use the underlying positioning details, which is due to the feature map extracted by the bottom pyramid layer Contains more localization details, while the feature map extracted by the top pyramid layer contains more target feature information. It can use the feature maps extracted from different levels of pyramid layers for prediction, and extract features of different sizes from different layers of the network for prediction without adding additional calculations. The upper part only takes the bottom layer of the feature pyramid for prediction, and the second half makes separate predictions for all layers of the feature pyramid, and finally integrates all the prediction results.

对于步骤S230，对多个融合特征图进行连接操作和上采样操作得到特征图像，具体地，将第一特征图、对第二特征图进行上采样操作的结果、对第三特征图进行上采样操作的结果、对第四特征图进行上采样操作的结果、对第五特征图进行上采样操作的结果进行连接操作，得到特征图像，特征图像融合了多尺度的融合特征图。通过公式可以表示为：F＝P₃*Up(P₄)*Up(P₅)*Up(P₆)*Up(P₇)，F为特征图像，Up为上采样操作。For step S230, perform concatenation and upsampling operations on multiple fused feature maps to obtain feature images, specifically, perform upsampling on the first feature map, the result of the upsampling operation on the second feature map, and upsampling on the third feature map The result of the operation, the result of the upsampling operation on the fourth feature map, and the result of the upsampling operation on the fifth feature map are connected to obtain a feature image, and the feature image is fused with a multi-scale fusion feature map. The formula can be expressed as: F=P ₃ *Up(P ₄ )*Up(P ₅ )*Up(P ₆ )*Up(P ₇ ), F is a feature image, and Up is an upsampling operation.

参照图3，对于步骤S300,对特征图像进行边界检测处理以得到文字区域的边界，通过多个角点对文字区域的边界进行定位以生成文字区域的区域候选框，包括但不限于以下步骤：With reference to Fig. 3, for step S300, feature image is carried out boundary detection processing to obtain the boundary of text area, the boundary of text area is positioned by a plurality of corner points to generate the region candidate box of text area, including but not limited to the following steps:

步骤S310,对特征图像进行边界检测处理，通过初始的角点表示特征图像中的文字区域的边界；Step S310, performing boundary detection processing on the feature image, and representing the boundary of the character area in the feature image through the initial corner point;

步骤S320,计算不同角点的偏移量，根据偏移量进行弯曲度预测，得到文字区域的弯曲度；Step S320, calculating the offset of different corner points, predicting the curvature according to the offset, and obtaining the curvature of the text area;

步骤S330,根据弯曲度增加或减少角点以定位表示特征图像中的文字区域的边界，进而通过多个角点对文字区域的边界进行定位，生成文字区域的区域候选框，得到区域候选图。Step S330, increase or decrease the corner points according to the degree of curvature to locate the boundary representing the text region in the feature image, and then locate the boundary of the text region through multiple corner points, generate a region candidate frame for the text region, and obtain a region candidate map.

以往的角点检测方法通常使用固定数量的点表示的多边形来描述文字区域，例如对于水平文字，使用2个点(左上角和右下角)来表示文字区域；对于多方向的文字，使用边界框中的4个点(左上角、右上角、右下角、左下角)来表示这些区域；但对于尺度不一的形状复杂的场景文字，采用固定数量的点的方式难以描述这些文字区域。Previous corner detection methods usually use a polygon represented by a fixed number of points to describe the text area, for example, for horizontal text, use 2 points (upper left and lower right) to represent the text area; for multi-directional text, use a bounding box 4 points in (upper left corner, upper right corner, lower right corner, lower left corner) to represent these regions; but for scene characters with complex shapes and different scales, it is difficult to describe these text regions with a fixed number of points.

在该实施例中，自适应角点检测方法首先对特征图像进行边界检测处理，得到特征图像中的文字区域的边界，给定初始的角点表示特征图像中的文字区域的边界，例如初始的角点可以是文字区域边界上的四个角点，四个初始的角点包括左上角、右上角、右下角、左下角，使用顺时针方向的二维坐标表示四个初始的角点的坐标，分别为(x1,y1)、(x2,y2)、(x3,y3)、(x4,y4)。根据四个初始的角点的坐标计算不同角点之间的偏移量，根据偏移量进行弯曲度预测，得到文字区域的弯曲度。根据文字区域的弯曲度，自适应地增加新的一组或多组成对的角点定位表示特征图像中的文字区域的边界，进而通过多个角点对文字区域的边界进行定位以生成文字区域的区域候选框，得到区域候选图。除此之外，当预测到的文字区域为水平时，只保留文字边界上的2个角点(左上角、右下角)即可，这样可以减少计算量。角点的数量与文字区域的比例对应。In this embodiment, the adaptive corner detection method first performs boundary detection processing on the feature image to obtain the boundary of the text area in the feature image, and the initial corner point represents the boundary of the text area in the feature image, such as the initial The corner points can be the four corner points on the boundary of the text area. The four initial corner points include the upper left corner, the upper right corner, the lower right corner, and the lower left corner. The coordinates of the four initial corner points are represented by clockwise two-dimensional coordinates , respectively (x1, y1), (x2, y2), (x3, y3), (x4, y4). The offset between different corner points is calculated according to the coordinates of the four initial corner points, and the curvature is predicted according to the offset to obtain the curvature of the text area. According to the curvature of the text area, adaptively add a new group or multiple pairs of corner point positioning to represent the boundary of the text area in the feature image, and then use multiple corner points to position the boundary of the text area to generate a text area The region candidate box, get the region candidate map. In addition, when the predicted text area is horizontal, only two corner points (upper left corner and lower right corner) on the text boundary can be reserved, which can reduce the amount of calculation. The number of corner points corresponds to the scale of the text area.

参照图4，对于步骤S400，对特征图像进行自适应阈值化处理以得到自适应阈值图，对自适应阈值图进行图像分割处理，将属于同一文字区域的不同文字分割到同一分割区域，得到分割图，包括但不限于以下步骤：Referring to Fig. 4, for step S400, adaptive thresholding is performed on the feature image to obtain an adaptive threshold map, image segmentation is performed on the adaptive threshold map, and different characters belonging to the same text area are segmented into the same segmented area to obtain a segmented diagram, including but not limited to the following steps:

步骤S410，对特征图像计算像素值，根据像素值进行自适应阈值化处理，得到自适应阈值图；Step S410, calculating pixel values for the feature image, performing adaptive thresholding processing according to the pixel values, to obtain an adaptive threshold map;

步骤S420，根据像素值计算文字之间的特征距离，根据特征距离将属于同一文字区域的不同文字分割到同一分割区域，以对自适应阈值图进行图像分割处理，得到分割图。Step S420, calculating the feature distance between characters according to the pixel value, and segmenting different characters belonging to the same character area into the same segmented area according to the feature distance, so as to perform image segmentation processing on the adaptive threshold map to obtain a segmented map.

以往的图像分割方法利用生成分割图来表示每个像素属于文字区域的概率。然而，由于文字区域的重叠和对文字区域像素的不准确预测，分割图中的这些文字区域无法彼此分离。为了从分割图中得到文字边界框，需要进行复杂的后处理。Conventional image segmentation methods use the generated segmentation map to represent the probability that each pixel belongs to the text region. However, these text regions in segmentation maps cannot be separated from each other due to the overlap of text regions and the inaccurate prediction of text region pixels. In order to obtain text bounding boxes from segmentation maps, complex post-processing is required.

在该实施例中，对特征图像计算像素值，根据像素值进行自适应阈值化处理，通过自适应阈值进行像素级预测，得到自适应阈值图。自适应阈值图的文字区域的不同位置的阈值是不同的，并且文字区域的边界的阈值小于文字区域的中心的阈值。根据像素值计算文字之间的特征距离，根据特征距离将属于同一文字区域的不同文字分割到同一分割区域，以对自适应阈值图进行图像分割处理，得到分割图。In this embodiment, pixel values are calculated for the feature image, adaptive thresholding processing is performed according to the pixel values, and pixel-level prediction is performed through the adaptive threshold to obtain an adaptive threshold map. The thresholds at different positions of the text area of the adaptive threshold map are different, and the threshold at the border of the text area is smaller than the threshold at the center of the text area. The feature distance between characters is calculated according to the pixel value, and different characters belonging to the same character area are segmented into the same segmented area according to the feature distance, so as to perform image segmentation processing on the adaptive threshold map to obtain a segmented map.

对于步骤500，将角点候选框和分割图进行联合优化，最终得到可视化的文字检测结果；文字检测结果包括文字在图像中的位置以及文字表达。For step 500, the corner point candidate frame and the segmentation map are jointly optimized to finally obtain a visual text detection result; the text detection result includes the position of the text in the image and the text expression.

本申请的自适应文字检测方法、装置、设备及介质，结合基于回归和分割方法两种思想，提出了自适应角点检测方法和自适应阈值分割方法，将不同尺度的文字生成不同数量的角点进行定位，利用自适应阈值的方法生成文字分割图，结合生成的角点候选框进行联合优化，得到可视化的文字检测结果；能够对任意形状和密集区域的图像文字进行识别检测，提高文字检测的鲁棒性。The self-adaptive text detection method, device, equipment and medium of the present application, combined with two ideas based on regression and segmentation methods, proposes an adaptive corner detection method and an adaptive threshold segmentation method, which generates different numbers of corners for text of different scales. Points are positioned, and the text segmentation map is generated by using the adaptive threshold method, and the generated corner candidate frames are combined for joint optimization to obtain visual text detection results; it can recognize and detect image text in arbitrary shapes and dense areas, and improve text detection. robustness.

本申请的实施例，还提供了一种自适应文字检测装置。The embodiment of the present application also provides an adaptive text detection device.

参照图5，自适应文字检测装置包括输入模块110、特征提取模块120、自适应角点检测模块130、自适应阈值分割模块140和输出模块150。Referring to FIG. 5 , the adaptive text detection device includes an input module 110 , a feature extraction module 120 , an adaptive corner detection module 130 , an adaptive threshold segmentation module 140 and an output module 150 .

其中，输入模块110用于获取待检测的文字图像，将所述文字图像输入至文字检测网络；特征提取模块120用于由所述文字图像提取出文字区域的特征信息，根据所述特征信息得到特征图像；自适应角点检测模块130用于对所述特征图像进行边界检测处理以得到所述文字区域的边界，通过多个角点对所述文字区域的边界进行定位以生成所述文字区域的区域候选框，所述角点的数量与所述文字区域的比例对应；自适应阈值分割模块140用于对所述特征图像进行自适应阈值化处理以得到自适应阈值图，对所述自适应阈值图进行图像分割处理，将属于同一文字区域的不同文字分割到同一分割区域，得到分割图；输出模块150用于联合所述区域候选图和所述分割图进行文字检测，得到文字检测结果。Wherein, the input module 110 is used for obtaining the text image to be detected, and the text image is input to the text detection network; the feature extraction module 120 is used for extracting the feature information of the text area from the text image, and obtains according to the feature information Feature image; the adaptive corner detection module 130 is used to perform boundary detection processing on the feature image to obtain the boundary of the text area, and locate the boundary of the text area through a plurality of corner points to generate the text area The region candidate frame, the number of the corner points corresponds to the ratio of the text region; the adaptive threshold segmentation module 140 is used to perform adaptive thresholding processing on the feature image to obtain an adaptive threshold map, and the self Adapting the threshold map to perform image segmentation processing, segmenting different characters belonging to the same text area into the same segmented area to obtain a segmented map; the output module 150 is used to perform text detection in conjunction with the region candidate map and the segmented map to obtain a text detection result .

可以理解的是，自适应文字检测方法实施例中的内容均适用于本自适应文字检测装置实施例中，本自适应文字检测装置实施例所具体实现的功能与自适应文字检测方法实施例相同，并且达到的有益效果与自适应文字检测方法实施例所达到的有益效果也相同。It can be understood that the content in the embodiment of the adaptive character detection method is applicable to the embodiment of the adaptive character detection device, and the functions implemented in the embodiment of the adaptive character detection device are the same as those in the embodiment of the adaptive character detection method , and the beneficial effect achieved is also the same as that achieved by the embodiment of the adaptive text detection method.

本申请的自适应文字检测方法可以采用PyTorch深度学习框架实现，在ubuntu20.04系统上进行。The self-adaptive text detection method of the present application can be realized by using the PyTorch deep learning framework, and is carried out on the ubuntu20.04 system.

自适应文字检测方法可以采用以下的数据集，ICDAR2015数据集是ICDAR鲁棒阅读竞赛中的自然场景文字图像，它包括1000张训练图像和500张测试图像；该数据集是由谷歌眼镜拍摄的场景图像，包含大量模糊的、文字尺度变化较大的文字区域，其文字实例在单词级上进行标记。Tota l-Text数据集由1255张训练图像和300张测试图像组成，这些图像包含3种以上不同的文字方向：水平、多方向和弯曲等；这些图像中的文字在单词级上进行标记。CTW1500数据集包含1000张训练图像和500张测试图像，其中包含不同尺度的图像文字，包括多方向文字、弯曲文字和不规则形状文字；这个数据集中的文字区域在句子水平上用边界点进行标记。The adaptive text detection method can use the following data sets. The ICDAR2015 data set is a natural scene text image in the ICDAR robust reading competition. It includes 1000 training images and 500 test images; the data set is a scene taken by Google Glass Image, containing a large number of blurry text regions with large variation in text scale, whose text instances are labeled at the word level. The Total-Text dataset consists of 1255 training images and 300 test images, which contain more than 3 different text orientations: horizontal, multi-directional, and curved, etc.; the text in these images is labeled at the word level. The CTW1500 dataset contains 1000 training images and 500 test images, which contain pictographs of different scales, including multi-directional text, curved text, and irregularly shaped text; text regions in this dataset are marked with boundary points at the sentence level .

采用ResNet-50网络进行检测，在此基础上加入特征金字塔网络作为特征提取网络，来构建多尺度融合特征图。首先对SynthText数据集上进行批量大小为8*100K次迭代预训练，然后分别在其他三个开放数据集上进行模型的微调。所有网络均采用Adam优化器进行训练，学习率设置为1*10^-4。The ResNet-50 network is used for detection, and on this basis, the feature pyramid network is added as a feature extraction network to construct a multi-scale fusion feature map. First, pre-train the batch size of 8*100K iterations on the SynthText dataset, and then fine-tune the model on the other three open datasets. All networks are trained with Adam optimizer, and the learning rate is set to 1*10 ^-4 .

主要使用三个性能评估指标进行评估：准确率(P)、召回率(R)和综合指标(F)。P表示正确检测到文字区域的数量(TP)与检测结果的总数(E)的比率；R表示正确检测到文字区域的数量(TP)与Ground-Truth中真实文字标注的总数(G)的比率；F是一种评估文字检测方法性能的综合指标，该值通过P和R计算获得。Mainly use three performance evaluation indicators for evaluation: precision rate (P), recall rate (R) and comprehensive index (F). P represents the ratio of the number of correctly detected text regions (TP) to the total number of detection results (E); R represents the ratio of the number of correctly detected text regions (TP) to the total number of real text annotations in Ground-Truth (G) ; F is a comprehensive indicator for evaluating the performance of text detection methods, and the value is obtained by calculating P and R.

表1自适应文字检测方法与其他方法的检测性能评估对比表Table 1 Comparison table of detection performance evaluation between adaptive text detection method and other methods

由表1可见，本实施例的自适应文字检测方法在ICDAR2015、Total-Text和CTW1500数据集上综合指标F分别为86.5％、85.3％和84.2％，相较于PSENet网络分别提升了0.8％、4.4％和2％，而相较于DBNet网络，在Tota l-Text和CTW1500数据集上综合指标F分别提升了0.6％和0.8％。It can be seen from Table 1 that the comprehensive index F of the adaptive text detection method in this embodiment is 86.5%, 85.3% and 84.2% on the ICDAR2015, Total-Text and CTW1500 data sets, which is respectively improved by 0.8% and 84.2% compared with the PSENet network. 4.4% and 2%, and compared with the DBNet network, the comprehensive index F on the Total-Text and CTW1500 datasets increased by 0.6% and 0.8% respectively.

本申请的实施例还提供了一种电子设备。参照图6，电子设备包括存储器220、处理器210、存储在存储器220上并可在处理器210上运行的程序以及用于实现处理器210和存储器220之间的连接通信的数据总线230，程序被处理器210执行时实现如上的自适应文字检测方法。The embodiment of the present application also provides an electronic device. Referring to FIG. 6 , the electronic device includes a memory 220, a processor 210, a program stored on the memory 220 and operable on the processor 210, and a data bus 230 for realizing connection and communication between the processor 210 and the memory 220, the program When executed by the processor 210, the above adaptive text detection method is realized.

该电子设备可以为包括平板电脑、车载电脑等任意智能终端。The electronic device may be any intelligent terminal including a tablet computer, a vehicle-mounted computer, and the like.

总体而言，对于电子设备的硬件结构，处理器210可以采用通用的CPU(CentralProcessingUnit，中央处理器)、微处理器、应用专用集成电路(ApplicationSpecificIntegratedCircuit，ASIC)、或者一个或多个集成电路等方式实现，用于执行相关程序，以实现本申请实施例所提供的技术方案。In general, for the hardware structure of the electronic device, the processor 210 may adopt a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits. Realization is used to execute related programs to realize the technical solutions provided by the embodiments of the present application.

存储器220可以采用只读存储器(ReadOnlyMemory，ROM)、静态存储设备、动态存储设备或者随机存取存储器(RandomAccessMemory，RAM)等形式实现。存储器220可以存储操作系统和其他应用程序，在通过软件或者固件来实现本说明书实施例所提供的技术方案时，相关的程序代码保存在存储器220中，并由处理器210来调用执行本申请实施例的自适应文字检测方法。The memory 220 may be implemented in the form of a read-only memory (ReadOnlyMemory, ROM), a static storage device, a dynamic storage device, or a random access memory (RandomAccessMemory, RAM). The memory 220 can store operating systems and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 220 and called by the processor 210 to execute the implementation of this application. Example adaptive text detection method.

输入/输出接口用于实现信息输入及输出。The input/output interface is used to realize information input and output.

通信接口用于实现本设备与其他设备的通信交互，可以通过有线方式(例如USB、网线等)实现通信，也可以通过无线方式(例如移动网络、WI FI、蓝牙等)实现通信。The communication interface is used to realize the communication and interaction between this device and other devices, and the communication can be realized through a wired method (such as USB, network cable, etc.), or through a wireless method (such as a mobile network, WI FI, Bluetooth, etc.).

总线230在设备的各个组件(例如处理器210、存储器220、输入/输出接口和通信接口)之间传输信息。处理器210、存储器220、输入/输出接口和通信接口通过总线230实现彼此之间在设备内部的通信连接。Bus 230 transfers information between the various components of the device, such as processor 210, memory 220, input/output interfaces, and communication interfaces. The processor 210 , the memory 220 , the input/output interface and the communication interface are connected to each other within the device through the bus 230 .

本申请的实施例，提供了一种计算机可读存储介质。所述计算机可读存储介质存储有计算机可执行指令，所述计算机可执行指令用于使计算机执行如上所述的自适应文字检测方法。An embodiment of the present application provides a computer-readable storage medium. The computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to make a computer execute the above-mentioned adaptive text detection method.

本领域普通技术人员可以理解，上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器，如中央处理器、数字信号处理器或微处理器执行的软件，或者被实施为硬件，或者被实施为集成电路，如专用集成电路。这样的软件可以分布在计算机可读介质上，计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的，术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外，本领域普通技术人员公知的是，通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据，并且可包括任何信息递送介质。在本说明书的上述描述中，参考术语“一个实施方式/实施例”、“另一实施方式/实施例”或“某些实施方式/实施例”等的描述意指结合实施方式或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施方式或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施方式或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施方式或示例中以合适的方式结合。Those skilled in the art can understand that all or some of the steps and systems in the methods disclosed above can be implemented as software, firmware, hardware and an appropriate combination thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit . Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media . In the above description of this specification, the description with reference to the terms "one embodiment/example", "another embodiment/example" or "certain embodiments/example" means that the description is described in conjunction with the embodiment or example. A specific feature, structure, material, or characteristic is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the described specific features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples.

本领域普通技术人员可以理解，上文中所公开方法中的全部或某些步骤、系统、设备中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。Those of ordinary skill in the art can understand that all or some of the steps in the methods disclosed above, the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof.

上述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括多指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例的方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-On ly Memory，简称ROM)、随机存取存储器(Random Access Memory，简称RAM)、磁碟或者光盘等各种可以存储程序的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including multiple instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM), magnetic disk or optical disc, etc. can store programs. medium.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，上述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the above units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or can be Integrate into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

尽管已经示出和描述了本申请的实施方式，本领域的普通技术人员可以理解：在不脱离本申请的原理和宗旨的情况下可以对这些实施方式进行多种变化、修改、替换和变型，本申请的范围由实施例及其等同物限定。Although the embodiments of the present application have been shown and described, those skilled in the art can understand that various changes, modifications, substitutions and variations can be made to these embodiments without departing from the principle and spirit of the present application. The scope of the present application is defined by the Examples and their equivalents.

以上是对本申请的较佳实施进行了具体说明，但本申请并不限于实施例，熟悉本领域的技术人员在不违背本申请精神的前提下可做作出种种的等同变形或替换，这些等同的变形或替换均包含在本实施例所限定的范围内。The above is a specific description of the preferred implementation of the present application, but the present application is not limited to the embodiments. Those skilled in the art can make various equivalent deformations or replacements without violating the spirit of the present application. These equivalent Any modification or replacement is within the scope defined by this embodiment.

Claims

1. An adaptive text detection method, comprising:

acquiring a character image to be detected, and inputting the character image to a character detection network;

extracting feature information of a character area from the character image through the character detection network, and obtaining a feature image according to the feature information;

carrying out boundary detection processing on the characteristic image to obtain the boundary of the character area, positioning the boundary of the character area through a plurality of corner points, generating a candidate area frame of the character area, and obtaining a candidate area image, wherein the number of the corner points corresponds to the shape of the character area;

carrying out self-adaptive thresholding on the characteristic image to obtain a self-adaptive threshold image, carrying out image segmentation on the self-adaptive threshold image, and segmenting different characters belonging to the same character region into the same segmentation region to obtain a segmentation image;

and combining the region candidate graph and the segmentation graph to perform character detection to obtain a character detection result.

2. The adaptive text detection method according to claim 1, wherein the extracting feature information of the text region from the text image, and obtaining the feature image according to the feature information comprises:

generating a plurality of basic feature mappings containing feature information of the character area according to the character image by a plurality of residual blocks of a backbone network of the character detection network;

generating a plurality of fusion feature maps with different scales by a feature pyramid network of the character detection network according to the plurality of basic feature maps;

and performing connection operation and up-sampling operation on the plurality of fusion feature maps to obtain a feature image.

3. The adaptive text detection method according to claim 2, wherein the fused feature map comprises a first feature map, a second feature map, a third feature map, a fourth feature map and a fifth feature map; the sizes of the first feature map, the second feature map, the third feature map, the fourth feature map and the fifth feature map are sequentially reduced; the first characteristic diagram, the second characteristic diagram, the third characteristic diagram, the fourth characteristic diagram and the fifth characteristic diagram have the same channel number.

4. The adaptive text detection method according to claim 3, wherein the performing a join operation and an upsample operation on the plurality of fused feature maps to obtain a feature image comprises:

and performing connection operation on the first feature diagram, the result of performing up-sampling operation on the second feature diagram, the result of performing up-sampling operation on the third feature diagram, the result of performing up-sampling operation on the fourth feature diagram and the result of performing up-sampling operation on the fifth feature diagram to obtain a feature image.

5. The method according to claim 1, wherein the performing a boundary detection process on the feature image to obtain a boundary of the text region, locating the boundary of the text region through a plurality of corner points, and generating a region candidate frame of the text region to obtain a region candidate map comprises:

carrying out boundary detection processing on the characteristic image, and representing the boundary of a character area in the characteristic image through an initial corner point;

calculating the offset of different corner points, and predicting the curvature according to the offset to obtain the curvature of the character area;

and increasing or decreasing corner points according to the curvature to position the boundary of the character area in the representation characteristic image, further positioning the boundary of the character area through a plurality of corner points, generating an area candidate frame of the character area, and obtaining an area candidate image.

6. The adaptive text detection method according to claim 1, wherein the adaptively thresholding the feature image to obtain an adaptive threshold map, and the image segmentation processing the adaptive threshold map to segment different texts belonging to the same text region into the same segmentation region to obtain a segmentation map comprises:

calculating a pixel value of the characteristic image, and performing self-adaptive thresholding processing according to the pixel value to obtain a self-adaptive threshold map;

and calculating the characteristic distance between the characters according to the pixel values, and dividing different characters belonging to the same character region into the same divided region according to the characteristic distance so as to perform image division processing on the self-adaptive threshold map to obtain a divided map.

7. The adaptive text detection method according to claim 6, wherein the thresholds of different positions of the text region of the adaptive threshold map are different, and the threshold of the boundary of the text region is smaller than the threshold of the center of the text region.

8. An adaptive text detection apparatus, comprising:

the input module is used for acquiring a character image to be detected and inputting the character image to a character detection network;

the characteristic extraction module is used for extracting characteristic information of the character area from the character image and obtaining a characteristic image according to the characteristic information;

the self-adaptive corner detection module is used for carrying out boundary detection processing on the characteristic image to obtain the boundary of the character area, positioning the boundary of the character area through a plurality of corners, generating a candidate area frame of the character area and obtaining a candidate area image, wherein the number of the corners corresponds to the proportion of the character area;

the self-adaptive threshold segmentation module is used for performing self-adaptive thresholding on the characteristic image to obtain a self-adaptive threshold image, performing image segmentation on the self-adaptive threshold image, and segmenting different characters belonging to the same character region into the same segmentation region to obtain a segmentation image;

and the output module is used for carrying out character detection by combining the region candidate graph and the segmentation graph to obtain a character detection result.

9. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the adaptive text detection method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the adaptive text detection method according to any one of claims 1 to 7.