CN113837168A - Image text detection and OCR recognition method, device and storage medium - Google Patents
Image text detection and OCR recognition method, device and storage medium Download PDFInfo
- Publication number
- CN113837168A CN113837168A CN202111118174.4A CN202111118174A CN113837168A CN 113837168 A CN113837168 A CN 113837168A CN 202111118174 A CN202111118174 A CN 202111118174A CN 113837168 A CN113837168 A CN 113837168A
- Authority
- CN
- China
- Prior art keywords
- text
- training
- image
- segmentation
- region
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000001514 detection method Methods 0.000 title claims abstract description 58
- 238000012549 training Methods 0.000 claims abstract description 124
- 230000011218 segmentation Effects 0.000 claims abstract description 93
- 238000012545 processing Methods 0.000 claims abstract description 29
- 238000007781 pre-processing Methods 0.000 claims abstract description 16
- 230000008569 process Effects 0.000 claims description 8
- 230000000694 effects Effects 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 5
- 230000002688 persistence Effects 0.000 claims description 3
- 238000002360 preparation method Methods 0.000 claims description 3
- 238000000926 separation method Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000012805 post-processing Methods 0.000 abstract description 5
- 230000006870 function Effects 0.000 description 14
- 230000009467 reduction Effects 0.000 description 9
- 230000004913 activation Effects 0.000 description 5
- 238000009499 grossing Methods 0.000 description 5
- 230000000750 progressive effect Effects 0.000 description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 4
- 230000015654 memory Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 230000004927 fusion Effects 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 210000000988 bone and bone Anatomy 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000008602 contraction Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4007—Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of data recognition, in particular to a method, a device and a storage medium for image text detection and OCR recognition, wherein the method comprises the following steps: preprocessing the picture to obtain training data; extracting the preliminary characteristics of the training data to obtain a return result and building a training network according to the return result; the training model calls a training network to train training data to obtain a plurality of text segmentation examples; processing a plurality of text segmentation examples by a watershed segmentation method to complete detection and identification; through the steps and the watershed segmentation method, a plurality of text segmentation examples are subjected to post-processing, the algorithm time complexity is effectively reduced to O (N), and the problem of adopting PSENet is solvedThe breadth-first algorithm in the algorithm flow carries out pixel-by-pixel four-neighborhood search and combination on each text segmentation example, which leads the algorithm time complexity in the detection stage to reach O (N)2) The method has the advantages of low detection speed and low efficiency, thereby improving the image processing speed and accelerating the efficiency.
Description
Technical Field
The invention relates to the technical field of data recognition, in particular to a method, a device and a storage medium for image text detection and OCR recognition.
Background
The core idea of the deep learning OCR method basically adopts a deep target detection algorithm strategy, the progressive expansion network PSENet is a method based on example segmentation, image feature extraction is carried out by adopting a back bone based on CNN, then a series of feature down-sampling, feature fusion and up-sampling operations are carried out on a feature image by adopting a network similar to a space pyramid to obtain a group of text segmentation examples with a predefined number, and finally the text examples are subjected to region communication by adopting a breadth-first algorithm.
CN110008950A patent "a method for detecting text in shape robust natural scene", application publication No. 2019.07.12, discloses a method for detecting text in shape robust natural scene, comprising the following steps: step 1, preprocessing training pictures in a text data set; step 2, building a PSENet progressive scale growth network, and completing feature extraction, feature fusion and segmentation prediction of a training picture by using the progressive scale growth network to obtain segmentation results of a plurality of prediction scales; step 3, performing supervised training on the PSENet progressive scale growth network built in the step 2 to obtain a detector model; step 4, detecting the picture to be detected; and 5, obtaining a final detection result by using a scale growth algorithm.
However, for an image with more text detection targets and the phenomenon of text region dislocation and overlap, the breadth-first algorithm in the PSENet algorithm flow is adopted to search and merge four adjacent regions pixel by pixel for each text segmentation example, which may cause the algorithm time complexity at the detection stage to reach O (N)2) Slow detection speedAnd the efficiency is low.
Disclosure of Invention
In order to solve the problem that the algorithm time complexity in the detection stage reaches O (N) by adopting the breadth-first algorithm in the PSENet algorithm flow to search and combine the four adjacent domains pixel by pixel for each text segmentation example2) The detection speed is slow, and the efficiency is low.
The invention provides an image text detection and OCR recognition method, which comprises the following steps:
preprocessing the picture to obtain training data;
extracting the preliminary features of the training data to obtain a return result and building a training network according to the return result;
the training model calls the training network to train the training data to obtain a plurality of text segmentation examples;
and processing a plurality of text segmentation examples by a watershed segmentation method to finish detection and identification.
Further, in a preferred embodiment, a text region of the picture is labeled, and the picture labeled with the text region is an original text coordinate tag; and processing the original text coordinate labels to generate a plurality of text segmentation kernels with similar shapes, the same central points and different sizes as training data of the training network.
Further, in a preferred embodiment, the training network is a PSENet forward network;
and extracting the preliminary features of the training data by loading a feature extraction model to obtain a return result, inputting the return result into a PSENet forward network, and constructing the PSENet forward network by using a feature space pyramid network according to a top-down mode.
Further, in a preferred embodiment, the training model invoking the training network to train the training data to obtain a plurality of text segmentation instances includes the following steps:
training preparation: setting a hyper-parameter, selecting an optimizer, and setting a mode for reading the training data into the training model;
training process: calling a PSENet forward network, calculating the current loss situation through comparison with a real label and a loss function, calculating and updating network parameter gradient by adopting an optimizer, carrying out iterative training until the ideal precision is reached, and carrying out persistence on the model;
and outputting a plurality of text segmentation examples after training is completed.
Further, in a preferred embodiment, a dice coefficient is used to define a loss function, samples with poor detection effects are screened out according to the loss of training data transmitted into the model, and the screened samples with poor detection effects are extracted and combined and trained in random gradient descent.
Further, in a preferred embodiment, processing a plurality of the text segmentation instances by a watershed segmentation method to determine a final text line region and a final background region, includes the following steps:
acquiring a foreground image mark, a background image mark and an uncertain region;
and operating a watershed segmentation algorithm to process the uncertain area to obtain a final text line area and a final background area.
Further, in a preferred embodiment, the obtaining of the foreground image mark, the background image mark and the uncertain region comprises the following steps:
marking pixels inside the minimum text segmentation example as a foreground area, and setting the pixel value of the area to be 255;
marking pixels outside the maximum text segmentation instance as a background region and setting the pixel value of the region to 128;
the region between the minimum text segmentation instance and the maximum text segmentation instance is taken as an uncertain region, and the pixel value of the region is set to 0.
Further, in a preferred embodiment, the step of operating the watershed segmentation algorithm to process the uncertain region to obtain the final text line region and the final background region comprises the following steps:
sequencing pixels in the gradient image of the uncertain region to obtain a geodesic distance threshold of a watershed segmentation algorithm, and marking the minimum value of the uncertain region as the lowest point;
continuously increasing the geodesic distance, screening out pixels smaller than the geodesic distance value, and if the distance from the screened pixels to the lowest point is smaller than a geodesic distance threshold value, submerging; otherwise, taking the gray value of the screened pixel as a local threshold, namely constructing a dam and completing the classification of the text region and the non-text region of the local region;
and the geodesic distance is continuously increased until the maximum value of the gray value, so that the separation of the text region from the background is completed, and the classification attribution judgment of all pixels is completed.
The invention also provides an image text detection and OCR recognition device, which comprises
A preprocessing module: the image preprocessing module is used for preprocessing the image to obtain training data;
training a network building module: the training network is used for extracting the preliminary features of the training data to obtain a return result and building a training network according to the return result;
a training module: the training network is used for calling the training network by a training model to train the training data so as to obtain a plurality of text segmentation examples;
a processing module: the detection and recognition are completed by processing a plurality of text segmentation examples through a watershed segmentation algorithm.
The invention also provides a computer readable storage medium, which stores computer instructions, and when executed by a processor, the computer implements any one of the image text detection and OCR recognition methods.
Compared with the prior art, the image text detection and OCR recognition method provided by the invention has the advantages that through the steps and the watershed segmentation method replacing a breadth-first search (BFS) algorithm in the original PSENet algorithm to carry out post-processing on a plurality of text segmentation examples, the algorithm time complexity is effectively reduced to O (N), the problem that the algorithm time complexity at the detection stage reaches O (N) due to the fact that the breadth-first algorithm in the PSENet algorithm flow is adopted to carry out pixel-by-pixel four-neighborhood search and combination on each text segmentation example is solved, and the algorithm time complexity at the detection stage can reach O (N)2) Slow and effective detectionThe rate is low, thereby improving the image processing speed and increasing the efficiency.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a step diagram of an image text detection and OCR recognition method provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc., indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Specific examples are given below:
in the invention, the image with more text detection targets and the phenomenon of text area dislocation and overlap is taken as the medical bill image as an example. Because a large amount of medical bill data are accumulated, retraining can be directly started, and parameters of a pre-trained model are not migrated for further training, the method can be used for model training in a train from scratch mode.
An image text detection and OCR recognition method comprises the following steps:
preprocessing the picture to obtain training data;
extracting the preliminary features of the training data to obtain a return result and building a training network according to the return result;
the training model calls the training network to train the training data to obtain a plurality of text segmentation examples;
and processing a plurality of text segmentation examples by a watershed segmentation method to finish detection and identification.
Compared with the prior art, the image text detection and OCR recognition method provided by the invention has the advantages that through the steps and the watershed segmentation method replacing a breadth-first search (BFS) algorithm in the original PSENet algorithm to carry out post-processing on a plurality of text segmentation examples, the algorithm time complexity is effectively reduced to O (N), the problem that the algorithm time complexity at the detection stage reaches O (N) due to the fact that the breadth-first algorithm in the PSENet algorithm flow is adopted to carry out pixel-by-pixel four-neighborhood search and combination on each text segmentation example is solved, and the algorithm time complexity at the detection stage can reach O (N)2) The method has the advantages of low detection speed and low efficiency, thereby improving the image processing speed and accelerating the efficiency.
Specifically, preprocessing a picture to obtain training data, wherein the picture can be a picture shot in a natural scene, and a text area of the picture is marked, and the picture marked with the text area is an original text coordinate label; the text area refers to an area with a text, the labeling mode can be manual or computer labeling, and the labeling mode is a polygonal coordinate and can be a coordinate of four points of a rectangular frame;
according to the requirement of progressive scale expansion, an original text coordinate label is processed through a Vatti clipping algorithm to generate a plurality of text segmentation cores with similar shapes, the same central point and different sizes as training data of the training network.
Specifically, the method for obtaining the text segmentation cores by performing contraction processing on the original text coordinate labels through the Vatti clipping algorithm comprises the following steps:
and calculating a reduction distance according to the area and the perimeter of the maximum text segmentation kernel and a reduction ratio:
in implementation, the original text coordinate label is contracted to obtain a plurality of text segmentation kernels which are p in sequence1,p2...piWherein the largest text segmentation kernel (i.e. the original kernel) is p1Any one of the text segmentation kernels piWith the largest text segmentation kernel p1Is reduced by a ratio riRelative distance is diArea and Perimeter are the Area and the Perimeter of the maximum text segmentation kernel respectively;
calculating a reduction ratio according to the number of text segmentation cores and the reduction ratio:
wherein m is a reduction scale, the range is (0, 1), n is the number of text segmentation examples, namely the number of text segmentation kernels, and both m and n are hyper-parameters of a PSENet algorithm;
calculating the reduced labels of the original text coordinate labels through a reduction formula to obtain a plurality of text segmentation kernels, wherein the plurality of text segmentation kernels are used as original input training data of a training network; the reduction formula refers to the above-described formula of the reduction distance and the reduction ratio.
Specifically, in the step of extracting the preliminary features of the training data to obtain a return result, and building a training network according to the return result:
the training network is a PSENet forward network, the feature extraction model is, but not limited to ResNet-18, ResNet-34, ResNet-152, ResNet-50, ResNet-101, vgg16, vgg19, shufflent and mobilene, preferably, the ResNet-152 model is selected, and ResNet-152 is a network with a deeper structure, can extract more effective features and has better precision;
extracting the preliminary characteristics of the training data by loading a ResNet-152 model on the Pythroch to obtain a return result, inputting the return result into a PSENet forward network, and building the PSENet forward network by using a characteristic space pyramid network according to a top-down mode. The process of extracting the preliminary features of the training data and obtaining the returned result by the ResNet-152 model is already the prior art, and redundant description is not repeated again.
Specifically, inputting the returned results [ c2, c3, c4 and c5] into a PSENet forward network, and constructing a PSENet training network by using a feature space pyramid network according to a top-down mode, wherein the PSENet training network comprises the following steps:
(1) p5 toplayer treatment:
c5 → p5:3 × 3 convolution, BN processing, ReLU activation function;
(2) p4 upsampling:
c4 → c4l:2 × 2 convolution, BN processing, ReLU activation function;
[ p5, c4l ] → p4 bilinear interpolation (p5) + c4l
(3) p4 smoothing:
p4 → p4 original size convolution, BN processing, ReLU activation function;
(4) p3 upsampling and smoothing:
c3 → c3l:1 × 1 convolution, BN processing, ReLU activation function;
[ p4, c3l ] → p3 bilinear interpolation (p4) + c3l
The smoothing treatment is the same as p 4;
(5) p2 upsampling and smoothing:
c2 → c2l original size convolution, BN processing and ReLU activation function;
[ p3, c2l ] → p2 bilinear interpolation (p3) + c2l
The smoothing treatment is the same as p 4;
(6) and (3) upsampling combination:
based on the size of p2, bilinearly interpolating p3-p5 into the size of p2 degrees, and then combining p2-p5 vectors by using a concatenate method; and completing the construction of the PSEnet forward network.
Specifically, the training model calls a training network to train training data to obtain a plurality of text segmentation examples, and the method comprises the following steps:
training preparation: setting a hyper-parameter, selecting an optimizer, and setting a mode of reading training data into a training model;
the optimization method comprises the following steps that hyper-parameters comprise completion of learning rate and decay tasks, segmentation examples, batch _ size and epoch, an optimizer selects but is not limited to SGD and Adam, Adam is selected, Adam has the advantages of being capable of dynamically adjusting learning rate and the like, and training data are read into a training model through a generator function batch;
training process: calling a PSENet forward network, calculating the current loss situation through comparison between a model prediction result and a real label and a loss function, calculating and updating network parameter gradient by adopting an optimizer, carrying out iterative training until the ideal precision is reached, and carrying out persistence on the model:
specifically, the method comprises the steps of training by taking epochs as a unit, completely training all data sub-batchs once by each epoch (without considering boundary problems), transmitting each batch data into a model, calling a PSENet forward network, comparing the training data with real labels, calculating the current loss condition by a loss function, calculating and updating network parameter gradients by using an Adam optimizer, iteratively training until the ideal precision is reached, and persisting the model; through continuous model iteration, the result predicted by each model is compared with the real label, and if the model prediction result is basically consistent with the real label, for example, the prediction precision reaches 95%, the model parameters at the moment are stored, namely, the model parameters are stored persistently.
In the text detection of the medical bill, the negative sample area is far larger than the positive sample area, the loss function is defined by dice coefficient, samples with poor detection effect are screened out according to the loss of training data of an incoming model, the screened samples with poor detection effect are extracted and combined and trained in random gradient descent, and the loss function specifically comprises the following steps:
wherein Sx,yTo predict the value of the resulting pixel, Gx,yIs the point value of the pixel in the real label.
The loss function is defined as L ═ λ LC+(1-λ)LSWherein L isCIs a classification loss of text regions, LSIs the loss of the contracted text instance, of
LC=1-D(Sn×M,Gn×M)
M is generated by an online hard case mining algorithm and is 0/1byte codes; and screening out samples with poor detection effect according to the loss of the training data transmitted into the model, then extracting and combining the screened samples and training by adopting Adam.
Specifically, the method for processing a plurality of text segmentation examples by a watershed segmentation method to determine a final text line region and a final background region comprises the following steps:
firstly, obtaining a foreground image mark, specifically marking pixels inside a minimum text segmentation example as a foreground area, and setting the pixel value of the area to be 255; acquiring a background image mark, specifically, marking a pixel outside a maximum text segmentation example as a background area, and setting a pixel value of the area as 128; and acquiring an uncertain region, specifically, taking a region between the minimum text segmentation example and the maximum text segmentation example as the uncertain region, and setting the pixel value of the region to be 0.
Secondly, operating a watershed segmentation algorithm to process the uncertain region to obtain a final text line region and a final background region, specifically comprising the following steps:
sequencing pixels in the gradient image of the uncertain region to obtain a geodesic distance threshold of a watershed segmentation algorithm, marking the minimum value of the uncertain region as the lowest point, and specifically obtaining the geodesic distance threshold of the watershed algorithm by running an OTSU algorithm;
continuously increasing the geodesic distance, screening out pixels smaller than the geodesic distance value, if the distance from the pixels to the lowest point is smaller than a geodesic distance threshold value, submerging, otherwise, taking the gray value of the pixels as a local threshold value, namely constructing a dam, and completing the classification of text regions and non-text regions of the local region;
and (4) continuously increasing the geodesic distance until the maximum value of the gray value is reached, and all the regions meet on the watershed line, so that the separation of the text region from the background is completed, the classification attribution judgment of all the pixels is completed, and the final text line region and the final background region are obtained.
According to the content of the invention, in implementation, M is set to be 0.5, n is set to be 6, ResNet-152 and M is set to be 3 are selected as a feature extraction network, training data is read into model training through a generator function batch, an Adam optimizer is adopted, and during training, the input image dimensions are [ B,3, H and W ], which are respectively corresponding to the batch _ size, the number of image channels and the height and width of an image;
setting the number of text segmentation examples to be 6, carrying out image down-sampling, feature fusion and image up-sampling on the batch training image feature graph, and outputting a batch with the same size as the original image, namely [ B,6, H, W ]]For each text line of each image, 6 text segmentation results S are generated1,S2,…,S6;
The medical bills (including outpatient service invoices and hospitalization invoices) are adopted to carry out test experiments, each type of picture contains 1000, and a display card of the test equipment is Tesla V100 and 32GB for display and storage. In the experiments, all pictures were limited to 1000 on the shortest side,
under the same conditions, the original BFS algorithm of the PSENet is replaced by the Watershed segmentation algorithm, and the minimum segmentation result S of all text regions is separated6Combining confidence foreground marked images as Watershed, and performing maximum segmentation on the result S1Negation as Watershed confidence background labelImage, S2,S3,...,S5The processing is performed as an uncertainty region of the algorithm:
for the original psenet algorithm, the accuracy reaches 92.37%, and the FPS (the number of pictures processed by the model per second, including data pre-processing and post-processing) reaches 11; the accuracy of the method reaches 92.51%, and FPS reaches 48%; obviously, under the condition of ensuring the precision, the processing speed of the method is more than 4 times of that of the original PSENet algorithm, compared with the prior art, the image text detection and OCR recognition method provided by the invention has the advantages that through the steps and the watershed segmentation method replacing the breadth-first search (BFS) algorithm in the original PSENet algorithm to carry out post-processing on a plurality of text segmentation examples, the algorithm time complexity is effectively reduced to O (N), the problem that the breadth-first algorithm in the PSENet algorithm flow is adopted to carry out pixel-by-pixel four-neighborhood search and combination on each text segmentation example is solved, and the algorithm time complexity in the detection stage can reach O (N) due to the fact that the algorithm time complexity in the detection stage reaches O (N)2) The method has the advantages of low detection speed and low efficiency, thereby improving the image processing speed and accelerating the efficiency.
The invention also provides an image text detection and OCR recognition system, which comprises a preprocessing module: the image preprocessing module is used for preprocessing the image to obtain training data; training a network building module: the training network is used for extracting the preliminary features of the training data to obtain a return result and building a training network according to the return result; a training module: the training network is used for calling the training network by a training model to train the training data so as to obtain a plurality of text segmentation examples; a processing module: the detection and recognition are completed by processing a plurality of text segmentation examples through a watershed segmentation algorithm. The image text detection and OCR recognition system provided by the invention improves the image processing speed and accelerates the efficiency.
The present invention also provides a computer readable storage medium having stored thereon computer instructions, which when executed by a processor, implement an image text detection and OCR recognition method as described in any of the above.
In specific implementation, the computer-readable storage medium is a magnetic Disk, an optical Disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid-State Drive (SSD), or the like; the computer readable storage medium may also include a combination of memories of the above kinds.
Although terms such as training data, preliminary features, training networks, training models, text segmentation instances, watershed segmentation, etc. are used more often herein, the possibility of using other terms is not excluded. These terms are used merely to more conveniently describe and explain the nature of the present invention; they are to be construed as being without limitation to any additional limitations that may be imposed by the spirit of the present invention.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. An image text detection and OCR recognition method is characterized in that: the method comprises the following steps:
preprocessing the picture to obtain training data;
extracting the preliminary features of the training data to obtain a return result and building a training network according to the return result;
the training model calls the training network to train the training data to obtain a plurality of text segmentation examples;
and processing a plurality of text segmentation examples by a watershed segmentation method to finish detection and identification.
2. An image text detection and OCR recognition method according to claim 1 and characterized in that: marking a text area of the picture, wherein the picture marked with the text area is an original text coordinate label; and processing the original text coordinate labels to generate a plurality of text segmentation kernels with similar shapes, the same central points and different sizes as training data of the training network.
3. An image text detection and OCR recognition method according to claim 1 and characterized in that: the training network is a PSENet forward network;
and extracting the preliminary features of the training data by loading a feature extraction model to obtain a return result, inputting the return result into a PSENet forward network, and constructing the PSENet forward network by using a feature space pyramid network according to a top-down mode.
4. An image text detection and OCR recognition method according to claim 3 and characterized in that: the training model calls the training network to train the training data to obtain a plurality of text segmentation examples, and the method comprises the following steps:
training preparation: setting a hyper-parameter, selecting an optimizer, and setting a mode for reading the training data into the training model;
training process: calling a PSENet forward network, calculating the current loss situation through comparison with a real label and a loss function, calculating and updating network parameter gradient by adopting an optimizer, carrying out iterative training until the ideal precision is reached, and carrying out persistence on the model;
and outputting a plurality of text segmentation examples after training is completed.
5. An image text detection and OCR recognition method according to claim 4 and characterized in that: and (3) defining a loss function by using dice coefficient, screening out samples with poor detection effect according to the loss of training data transmitted into the model, extracting and combining the screened samples with poor detection effect, and training in random gradient descent.
6. An image text detection and OCR recognition method according to claim 1 and characterized in that: processing a plurality of text segmentation examples by a watershed segmentation method to determine a final text line region and a final background region, comprising the following steps:
acquiring a foreground image mark, a background image mark and an uncertain region;
and operating a watershed segmentation algorithm to process the uncertain area to obtain a final text line area and a final background area.
7. An image text detection and OCR recognition method according to claim 6 and further comprising: the method for acquiring the foreground image mark, the background image mark and the uncertain region comprises the following steps:
marking pixels inside the minimum text segmentation example as a foreground area, and setting the pixel value of the area to be 255;
marking pixels outside the maximum text segmentation instance as a background region and setting the pixel value of the region to 128;
the region between the minimum text segmentation instance and the maximum text segmentation instance is taken as an uncertain region, and the pixel value of the region is set to 0.
8. An image text detection and OCR recognition method according to claim 6 and further comprising: the method for processing the uncertain region by running the watershed segmentation algorithm to obtain the final text line region and the final background region comprises the following steps:
sequencing pixels in the gradient image of the uncertain region to obtain a geodesic distance threshold of a watershed segmentation algorithm, and marking the minimum value of the uncertain region as the lowest point;
continuously increasing the geodesic distance, screening out pixels smaller than the geodesic distance value, and if the distance from the screened pixels to the lowest point is smaller than a geodesic distance threshold value, submerging; otherwise, taking the gray value of the screened pixel as a local threshold, namely constructing a dam and completing the classification of the text region and the non-text region of the local region;
and the geodesic distance is continuously increased until the maximum value of the gray value, so that the separation of the text region from the background is completed, and the classification attribution judgment of all pixels is completed.
9. An image text detection and OCR recognition device, characterized by: comprises that
A preprocessing module: the image preprocessing module is used for preprocessing the image to obtain training data;
training a network building module: the training network is used for extracting the preliminary features of the training data to obtain a return result and building a training network according to the return result;
a training module: the training network is used for calling the training network by a training model to train the training data so as to obtain a plurality of text segmentation examples;
a processing module: the detection and recognition are completed by processing a plurality of text segmentation examples through a watershed segmentation algorithm.
10. A computer-readable storage medium characterized by: the computer readable storage medium stores computer instructions which, when executed by a processor, implement a method of image text detection and OCR recognition as recited in any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111118174.4A CN113837168A (en) | 2021-09-22 | 2021-09-22 | Image text detection and OCR recognition method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111118174.4A CN113837168A (en) | 2021-09-22 | 2021-09-22 | Image text detection and OCR recognition method, device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113837168A true CN113837168A (en) | 2021-12-24 |
Family
ID=78969694
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111118174.4A Pending CN113837168A (en) | 2021-09-22 | 2021-09-22 | Image text detection and OCR recognition method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113837168A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116630755A (en) * | 2023-04-10 | 2023-08-22 | 雄安创新研究院 | Method, system and storage medium for detecting text position in scene image |
CN116863482A (en) * | 2023-09-05 | 2023-10-10 | 华立科技股份有限公司 | A transformer detection method, device, equipment and storage medium |
CN116935394A (en) * | 2023-07-27 | 2023-10-24 | 南京邮电大学 | Train carriage number positioning method based on PSENT region segmentation |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011128070A (en) * | 2009-12-18 | 2011-06-30 | Hitachi High-Technologies Corp | Image processing device, measuring/testing system, and program |
CN102725773A (en) * | 2009-12-02 | 2012-10-10 | 惠普发展公司,有限责任合伙企业 | System and method of foreground-background segmentation of digitized images |
US20150078648A1 (en) * | 2013-09-13 | 2015-03-19 | National Cheng Kung University | Cell image segmentation method and a nuclear-to-cytoplasmic ratio evaluation method using the same |
CN110008950A (en) * | 2019-03-13 | 2019-07-12 | 南京大学 | A Shape-Robust Approach for Text Detection in Natural Scenes |
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
CN110766008A (en) * | 2019-10-29 | 2020-02-07 | 北京华宇信息技术有限公司 | Text detection method facing any direction and shape |
CN111145209A (en) * | 2019-12-26 | 2020-05-12 | 北京推想科技有限公司 | Medical image segmentation method, device, equipment and storage medium |
CN111738256A (en) * | 2020-06-02 | 2020-10-02 | 上海交通大学 | Composite CT image segmentation method based on improved watershed algorithm |
CN111798480A (en) * | 2020-07-23 | 2020-10-20 | 北京思图场景数据科技服务有限公司 | Character detection method and device based on single character and character connection relation prediction |
US20210034700A1 (en) * | 2019-07-29 | 2021-02-04 | Intuit Inc. | Region proposal networks for automated bounding box detection and text segmentation |
-
2021
- 2021-09-22 CN CN202111118174.4A patent/CN113837168A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102725773A (en) * | 2009-12-02 | 2012-10-10 | 惠普发展公司,有限责任合伙企业 | System and method of foreground-background segmentation of digitized images |
JP2011128070A (en) * | 2009-12-18 | 2011-06-30 | Hitachi High-Technologies Corp | Image processing device, measuring/testing system, and program |
US20150078648A1 (en) * | 2013-09-13 | 2015-03-19 | National Cheng Kung University | Cell image segmentation method and a nuclear-to-cytoplasmic ratio evaluation method using the same |
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
CN110008950A (en) * | 2019-03-13 | 2019-07-12 | 南京大学 | A Shape-Robust Approach for Text Detection in Natural Scenes |
US20210034700A1 (en) * | 2019-07-29 | 2021-02-04 | Intuit Inc. | Region proposal networks for automated bounding box detection and text segmentation |
CN110766008A (en) * | 2019-10-29 | 2020-02-07 | 北京华宇信息技术有限公司 | Text detection method facing any direction and shape |
CN111145209A (en) * | 2019-12-26 | 2020-05-12 | 北京推想科技有限公司 | Medical image segmentation method, device, equipment and storage medium |
CN111738256A (en) * | 2020-06-02 | 2020-10-02 | 上海交通大学 | Composite CT image segmentation method based on improved watershed algorithm |
CN111798480A (en) * | 2020-07-23 | 2020-10-20 | 北京思图场景数据科技服务有限公司 | Character detection method and device based on single character and character connection relation prediction |
Non-Patent Citations (3)
Title |
---|
WENHAI WANG等: "Shape Robust Text Detection with Progressive Scale Expansion Network", 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), pages 9328 - 9337 * |
程序员阿德: "图像分割的经典算法:分水岭算法", pages 1 - 7, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/67741538?utm_id=0,知乎> * |
运动小爽: "使用watershed作为psenet的后处理", pages 1, Retrieved from the Internet <URL:https://www.jianshu.com/p/ed750a1c488c?utm_campaign=maleskine&utm_content=note&utm_medium=seo_notes&utm_source=recommendation,简书> * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116630755A (en) * | 2023-04-10 | 2023-08-22 | 雄安创新研究院 | Method, system and storage medium for detecting text position in scene image |
CN116630755B (en) * | 2023-04-10 | 2024-04-02 | 雄安创新研究院 | Method, system and storage medium for detecting text position in scene image |
CN116935394A (en) * | 2023-07-27 | 2023-10-24 | 南京邮电大学 | Train carriage number positioning method based on PSENT region segmentation |
CN116935394B (en) * | 2023-07-27 | 2024-01-02 | 南京邮电大学 | Train carriage number positioning method based on PSENT region segmentation |
CN116863482A (en) * | 2023-09-05 | 2023-10-10 | 华立科技股份有限公司 | A transformer detection method, device, equipment and storage medium |
CN116863482B (en) * | 2023-09-05 | 2023-12-19 | 华立科技股份有限公司 | A transformer detection method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110930397B (en) | Magnetic resonance image segmentation method and device, terminal equipment and storage medium | |
EP3620979B1 (en) | Learning method, learning device for detecting object using edge image and testing method, testing device using the same | |
CN110428428B (en) | An image semantic segmentation method, electronic device and readable storage medium | |
Abdollahi et al. | Improving road semantic segmentation using generative adversarial network | |
CN113837168A (en) | Image text detection and OCR recognition method, device and storage medium | |
CN113111871B (en) | Training method and device of text recognition model, text recognition method and device | |
CN107480726A (en) | A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon | |
CN113139543B (en) | Training method of target object detection model, target object detection method and equipment | |
CN114419570B (en) | Point cloud data identification method and device, electronic equipment and storage medium | |
CN108280455B (en) | Human body key point detection method and apparatus, electronic device, program, and medium | |
EP3813661A1 (en) | Human pose analysis system and method | |
CN111369581A (en) | Image processing method, device, equipment and storage medium | |
CN108009554A (en) | A kind of image processing method and device | |
CN114821778A (en) | A method and device for dynamic recognition of underwater fish body posture | |
CN112991280B (en) | Visual detection method, visual detection system and electronic equipment | |
CN111899259A (en) | Prostate cancer tissue microarray classification method based on convolutional neural network | |
CN116152171A (en) | Intelligent construction target counting method, electronic equipment and storage medium | |
CN112991281B (en) | Visual detection method, system, electronic equipment and medium | |
CN112241736A (en) | Text detection method and device | |
CN111967408B (en) | Low-resolution pedestrian re-identification method and system based on prediction-recovery-identification | |
CN114511702A (en) | Remote sensing image segmentation method and system based on multi-scale weighted attention | |
Samudrala et al. | Semantic segmentation in medical image based on hybrid Dlinknet and UNet | |
CN116823761A (en) | Information processing methods, devices, equipment and storage media based on cell segmentation | |
CN117218481A (en) | Fish identification method, device, equipment and storage medium | |
CN116310832A (en) | Remote sensing image processing method, device, equipment, medium and product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |