[go: up one dir, main page]

CN101593276B - Video OCR image-text separation method and system - Google Patents

Video OCR image-text separation method and system Download PDF

Info

Publication number
CN101593276B
CN101593276B CN2008101136592A CN200810113659A CN101593276B CN 101593276 B CN101593276 B CN 101593276B CN 2008101136592 A CN2008101136592 A CN 2008101136592A CN 200810113659 A CN200810113659 A CN 200810113659A CN 101593276 B CN101593276 B CN 101593276B
Authority
CN
China
Prior art keywords
image
stroke
bianry image
pixel
extracts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2008101136592A
Other languages
Chinese (zh)
Other versions
CN101593276A (en
Inventor
禹晶
黄磊
刘昌平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hanwang Technology Co Ltd
Original Assignee
Hanwang Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hanwang Technology Co Ltd filed Critical Hanwang Technology Co Ltd
Priority to CN2008101136592A priority Critical patent/CN101593276B/en
Publication of CN101593276A publication Critical patent/CN101593276A/en
Application granted granted Critical
Publication of CN101593276B publication Critical patent/CN101593276B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video Optical Character Recognition (OCR) image-text separation method and a system thereof to solve the problem that the current recognition method by using a bilateral model can not extract intact strokes in the case of uneven thickness of strokes. The methods comprises: the bilateral model is improved, so that text strokes with the width within the preset range is extracted; the improved bilateral model is used for extracting stroke images from an original text image; the extracted stroke images are converted into binary images; and the binary images are treated by de-noising processing. The invention can extract all the strokes with the stroke width within the preset range, and during the image post-processing stage, a new method combining of two-level overall threshold and edge detection is provided, which can be effective in eliminating background noise while maintaining the integrity of strokes.

Description

A kind of video OCR image-text separation method and system
Technical field
The present invention relates to image processing field, particularly relate to a kind of video OCR image-text separation method and system.
Background technology
OCR is the abbreviation of English Optical Character Recognition, and the meaning is an optical character identification, also can be called literal identification simply, is literal a kind of method of input automatically.It obtains character image information on the paper by scanning and optics input mode such as shooting, utilizes various algorithm for pattern recognitions to analyze the literal morphological features, judges the standard code of Chinese character, and is stored in the text by general format.
Videotext identification (Video OCR) is an emerging research direction of video analysis and application, is an important research direction towards the intelligent vision Flame Image Process of man-machine interaction.Video OCR is the technology that the text in the video is detected, follows the tracks of, extracts, discerns and retrieves, and this technology can extract literal in video pictures, and converts it into editable electronics literary composition file.The information that literal comprised in the video has important value for the interior perhaps semanteme of understanding video, along with digital video in the application of every field more and more widely, the extraction of video information, retrieval, inquiring technology are more and more important, and the research of Video OCR also becomes focus gradually.
In the videotext identifying, it is critical step wherein that picture and text separate, and is that text character is extracted from video pictures or complex background, so that the OCR engine converts it into editable electronics literary composition file.At present, the researchist has proposed the method that a variety of picture and text separate, as global threshold method, local threshold method, based on characterization method, color cluster method, stroke modeling method, or the like.Wherein, described stroke modeling method utilizes this model to extract character stroke from video pictures by setting up the model of character stroke.The model that this method is used is based on the bilateral model of stroke, described bilateral model description the local feature of character stroke, be applied to from cheque image, extract under the different complex backgrounds handwritten text.
There is following shortcoming in the stroke modeling method that adopts bilateral model in the picture and text detachment process:
Because described bilateral model is relatively more responsive to the width of stroke, can only discern processing to the stroke of specified width, which width, therefore, described method can only be extracted character under the prerequisite of known stroke width, if the stroke weight of character is irregular, then can not extract complete character stroke.
Summary of the invention
Technical matters to be solved by this invention provides a kind of video OCR image-text separation method and system, to solve the existing recognition methods of adopting bilateral model under the situation of stroke weight inequality, can't extract the problem of complete stroke.
For solving the problems of the technologies described above,, the invention discloses following technical scheme according to specific embodiment provided by the invention:
A kind of graph separation comprises:
Bilateral model is improved, make the text stroke that extracts in the predetermined width scope;
Utilize described improved bilateral model, from the urtext image, extract the stroke pattern picture;
The described stroke pattern that extracts is looked like to be converted to bianry image;
Described bianry image is carried out denoising, comprising: described urtext image is carried out rim detection, again the hole of surrounded by edges is filled, obtain template; Described template and described bianry image are carried out AND operation, extract the pixel of corresponding templates position in the described bianry image, obtain removing the text stroke behind the noise.
Preferably, the step that the described stroke pattern that extracts is looked like to be converted to bianry image comprises: the described stroke pattern that extracts is looked like to choose the two-stage threshold value, obtain low accordingly threshold value bianry image and high threshold bianry image; Pixel in the scan round high threshold bianry image when scanning the pixel of stroke correspondence, will be hanged down the pixel of correspondence position in the threshold value bianry image as seed points; In described low threshold value bianry image, begin to seek connected component from this seed points; After the pixel in the high threshold bianry image had all been scanned, the connected component that extracts from described low threshold value bianry image was the bianry image after the conversion.
Preferably, when in the high threshold bianry image, scanning the pixel of stroke correspondence, also comprise: begin described high threshold bianry image, to seek connected component from this pixel, and the pixel of connected component is labeled as scans.
Preferably, described urtext image is carried out after the rim detection, before the hole of surrounded by edges is filled, also comprise: testing result is carried out the edge connect.
Preferably, from the urtext image, extract before the stroke pattern picture, also comprise: described urtext image is carried out the figure image intensifying.
A kind of picture and text piece-rate system comprises:
The modelling unit is used for bilateral model is improved, and makes the text stroke that extracts in the predetermined width scope;
The stroke extraction unit is used to utilize described improved bilateral model, extracts the stroke pattern picture from the urtext image;
Binarization unit is used for the described stroke pattern that extracts is looked like to be converted to bianry image;
The denoising unit is used for described bianry image is carried out denoising, and described denoising unit comprises that further template sets up subelement, is used for described urtext image is carried out rim detection, the hole of surrounded by edges is filled again, and obtains template; The denoising subelement is used for described template and described bianry image are carried out AND operation, extracts the pixel of corresponding templates position in the described bianry image, obtains removing the text stroke behind the noise.
Preferably, described binarization unit further comprises: two-stage global threshold subelement, be used for the described stroke pattern that extracts is looked like to choose the two-stage threshold value, and obtain low accordingly threshold value bianry image and high threshold bianry image; Connected component extracts subelement, the pixel that is used for scan round high threshold bianry image when scanning the pixel of stroke correspondence, will be hanged down the pixel of correspondence position in the threshold value bianry image as seed points, in described low threshold value bianry image, begin to seek connected component from this seed points; After the pixel in the high threshold bianry image had all been scanned, the connected component that extracts from described low threshold value bianry image was the bianry image after the conversion.
Preferably, described system also comprises: pretreatment unit is used for described urtext image is carried out the figure image intensifying; Extract the stroke pattern picture the urtext image of described stroke extraction unit after the figure image intensifying.
According to specific embodiment provided by the invention, the present invention has following technique effect:
At first, the present invention improves bilateral model, make the bilateral model after the improvement can extract stroke width all strokes in preset range, thereby has solved the problem that master mould can only extract the specified width, which width stroke.Therefore, at the uneven stroke of thickness, the present invention also can intactly extract it from the image of complex background or frame of video.
Secondly, in the post-processing stages of image, the present invention proposes a kind of new method in conjunction with two-stage global threshold and rim detection, the integrality of denoising simultaneously and maintenance stroke.At first use two-stage global threshold method that the stroke pattern that extracts is looked like to carry out binary conversion treatment, obtain two-value stroke pattern picture; The method that adopts rim detection and hole to fill is then set up a template, is used to locate the position of each literal; At last described template and described two-value stroke pattern are looked like to carry out AND operation, obtain clean, complete text stroke.Because the two-value stroke pattern picture that adopts two-stage global threshold method to obtain, noise mainly is present between the stroke, and mainly there is edge noise in described template, described AND operation can extract the pixel of corresponding templates position in the two-value stroke pattern picture, therefore, when guaranteeing integrity of shape, can remove ground unrest effectively.
Description of drawings
Fig. 1 is the bilateral model synoptic diagram on 1-D in the prior art;
Fig. 2 is the bilateral model synoptic diagram on 1-D among the present invention;
Fig. 3 is a kind of graph separation process flow diagram that the embodiment of the invention provides;
Fig. 4 a is an embodiment of the invention Central Plains text image synoptic diagram;
Fig. 4 b is that laplacian image strengthens synoptic diagram in the embodiment of the invention;
Fig. 4 c is that the stroke pattern that extracts in the embodiment of the invention is as synoptic diagram;
Fig. 4 d is a low threshold value bianry image synoptic diagram in the embodiment of the invention;
Fig. 4 e is a high threshold bianry image synoptic diagram in the embodiment of the invention;
Fig. 4 f is a binaryzation result schematic diagram in the embodiment of the invention;
Fig. 5 a is an embodiment of the invention Central Plains text image synoptic diagram, and Fig. 5 b is based on the denoising result synoptic diagram of condition expansion method, and Fig. 5 c is the denoising result synoptic diagram of the method for the invention;
Fig. 6 a is an edge-detected image synoptic diagram in the embodiment of the invention;
Fig. 6 b is that hole is filled synoptic diagram in the embodiment of the invention;
Fig. 6 c be Fig. 4 f and Fig. 6 b in the embodiment of the invention " with " the image synoptic diagram;
Fig. 6 d is a final segmentation result synoptic diagram in the embodiment of the invention;
Fig. 7 is based on the denoising method process flow diagram of rim detection in the embodiment of the invention;
Fig. 8 is a kind of picture and text piece-rate system structural drawing that the embodiment of the invention provides.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
In the picture and text detachment process, the present invention improves existing bilateral model, make the bilateral model after the improvement can extract stroke width all strokes in preset range, thereby has solved the problem that master mould can only extract the specified width, which width stroke.The present invention also is applicable to and extracts the text stroke from the image of complex background, describes in detail below.
At first introduce existing bilateral model.
In actual applications, in order to make the text readability more in frame of video or the complex background image, design a profile of distinguishing with background around the text stroke, be referred to as bilateral structure in this feature of text and background border, reference is shown in Figure 1.The width of bilateral structure is stroke width.
Suppose bright stroke on dark background, width and contrast are two key characters of character (as Chinese character) stroke.On 1-D, width is that the stroke intensity of W is measured according to following formula:
DE ( x ) = f ( x ) - min i - 1 W - 1 { max ( f ( x - i ) , f ( x - i + W ) ) } - - - ( 1 )
Wherein, former text image uses DE (x) to represent with f (x) expression, stroke pattern picture.Corresponding diagram 1, W is default stroke width, and the width of bilateral structure is actual stroke width.
On 2-D, stroke intensity is measured according to following formula:
DE W ( x , y ) = max d = 0 3 { DE d ( x , y ) } - - - ( 2 )
Following formula on the occasion of the existence that shows bright stroke on the dark background.In the formula, d=0,1,2,3 expression levels, vertical, diagonal line and back-diagonal direction, promptly 0, π/4, pi/2,3 π/4}.Therefore, the stroke pattern picture can be expressed as
S W ( x , y ) = DE W + ( x , y ) - - - ( 3 )
This model has limited it to the susceptibility of stroke width can only extract character under the prerequisite of known stroke width, and can only extract the stroke of specified width, which width.Yet, in more practical application, can not obtain the width of stroke in advance.
Based on above reason, the improved bilateral model of the present invention is defined as follows:
DE ( x ) = f ( x ) - min w = W i W h { min i = 1 w - 1 { max ( f ( x - i ) , f ( x - i + w ) ) } } - - - ( 4 )
Wherein, W lAnd W hRepresent default minimum and maximum stroke width respectively.On 1-D, described bilateral model synoptic diagram as shown in Figure 2, when w equaled actual stroke width, stroke intensity reached peak response.In this bilateral model, w travels through [W l, W h] interval all interior stroke widths, thereby make the text stroke obtain maximum response.Yet for existing bilateral model shown in Figure 1, if preset stroke width W less than actual stroke width, stroke will be by omission; If default stroke width W is greater than actual stroke width, when background color was close with textcolor, stroke also can not detect.
On 2-D, stroke intensity can be expressed as
DE W ( x , y ) = max d = 0 3 { DE d ( x , y ) } - - - ( 5 )
Wherein, d=0,1,2,3 represent level, vertical, diagonal line and back-diagonal four direction respectively.This model can extract stroke width at [W l, W h] all strokes of scope, therefore, this model can guarantee the integrality of character well.Like this, if stroke weight is inhomogeneous, the present invention also can intactly extract it from the image of complex background or frame of video.
The method of utilizing above-mentioned improved bilateral model to extract character stroke is as follows:
Figure GDA0000058122470000061
Though said process can extract complete character stroke, but this model can extract connected component alike with stroke between the stroke mistakenly, therefore the text that utilizes this model to extract also exists noise to need to remove between stroke, and this process is called the post processing of image stage.Wherein, described connected component promptly refers to the noise between the stroke, as the unnecessary stroke between " discipline " word stroke among Fig. 1.
The denoising method that present embodiment adopts is: at first use two-stage global threshold method that the stroke pattern that extracts is looked like to carry out binary conversion treatment, obtain two-value stroke pattern picture; The method that adopts rim detection and hole to fill is then set up a template, is used to locate the position of each literal; At last described template and described two-value stroke pattern are looked like to carry out AND operation, obtain clean, complete text stroke.
Below in conjunction with the complete text recognition method of example in detail.
With reference to Fig. 3, be a kind of graph separation process flow diagram that the embodiment of the invention provides.This method can extract clear, complete text from video or complicated image, specific as follows:
S301 carries out image enhancement processing to former line of text.
This step is preferred pre-treatment step, and purpose is the edge and the details of outstanding stroke.The figure image intensifying is meant some information of strengthening image selectively, as edge, profile, contrast etc., so that follow-up processing and analysis.
In image processing field, the method for figure image intensifying has a variety of, as image enchancing method based on mathematical morphology, and based on the image enchancing method of rough set theory, or the like.Present embodiment utilizes Laplce's sharpening operator that former line of text is carried out the figure image intensifying.Laplace operator is based on the figure image intensifying operator of second-order differential, has isotropic characteristics, i.e. the sudden change orientation independent of the response of wave filter and image.Use Laplace transform can be expressed as following formula to the basic skills of figure image intensifying:
g(x,y)=5f(x,y)-[f(x+1,y)+f(x-1,y)+f(x,y+1)+f(x,y-1)] (6)
With reference to Fig. 4, Fig. 4 a is former text image synoptic diagram, and Fig. 4 b is that laplacian image strengthens synoptic diagram.As can be seen from the figure, strengthened the contrast of gray scale sudden change place in the image, the detail section in the image is enhanced by Laplace transform.
S302 uses improved bilateral model to extract the text stroke from video or image.
Utilize preceding method to extract stroke, the stroke pattern that obtains similarly is a gray-scale map.With reference to Fig. 4 c, be the stroke pattern that extracts as synoptic diagram, the text message that extracts is " he say you this age ", but has background image and a large amount of noises in this image, the denoising process is as follows.
S303 looks like to be converted to bianry image with the described stroke pattern that extracts, thereby removes background image.
The purpose of binaryzation is to remove background image, and present embodiment adopts two-stage global threshold method that the stroke gray-scale map is converted to bianry image.The global threshold method is a kind of threshold segmentation method, and thresholding method is a kind of image Segmentation Technology based on the zone, and its ultimate principle is: by setting different characteristic threshold value, the image slices vegetarian refreshments is divided into some classes.If image pixel is divided into two parts of black and white, then the result of image segmentation is exactly a bianry image, and this process is called image binaryzation.The global threshold method is meant utilizes global information that entire image is obtained the optimum segmentation threshold value, can be single threshold, also can be many threshold values.
Two-stage global threshold method is a kind of threshold segmentation method based on the global threshold method, and it is as follows that present embodiment utilizes two-stage global threshold method to carry out the process of image binaryzation:
At first, select two segmentation thresholds respectively at a width of cloth stroke pattern picture, one is low threshold value, and segmentation result is low threshold value bianry image; Another is a high threshold, and segmentation result is the high threshold bianry image.With reference to Fig. 4 d and Fig. 4 e, be respectively the synoptic diagram of low threshold value bianry image and high threshold bianry image.As can be seen from the figure, the text in the low threshold value bianry image is clearer, complete, but has comprised more background parts; And the noise in the high threshold bianry image is less, but text has more incompleteness.
Present embodiment is based on described low threshold value of Ostu selection of threshold and high threshold, chooses α Ostu threshold value doubly as low threshold value, and (the Ostu threshold value doubly of β>α) is as high threshold to choose β.The Ostu-maximum between-cluster variance is to be proposed in 1979 by big Tianjin of Japanese scholar (Ostu), is the method that a kind of self-adapting threshold is determined, is big Tianjin method again, is called for short Ostu.The Ostu algorithm can be described as the simple high efficiency method that self-adaptation is calculated single threshold (being used for changing gray level image is bianry image).This algorithm is analyzed the histogram of the gray level image of input, and histogram is divided into two parts, makes that the distance between two parts is maximum, and division points is exactly the threshold value of trying to achieve.
Secondly, the pixel in the scanning high threshold bianry image when scanning the pixel of stroke correspondence, will be hanged down the pixel of correspondence position in the threshold value bianry image as seed points; In described low threshold value bianry image, begin to seek connected component from this seed points; Above-mentioned steps is carried out in circulation, and after the pixel in the high threshold bianry image had all been scanned, the connected component that extracts from described low threshold value bianry image was the bianry image after the conversion.
Preferably, when in the high threshold bianry image, scanning the pixel of stroke correspondence, begin described high threshold bianry image, to seek connected component, and the pixel of connected component is labeled as scans from this pixel.
Illustrate, at the example shown in Fig. 4 d and the 4e, the background of image is a black, and corresponding pixel value is 0, and the text stroke in the image is a white, and corresponding pixel value is 1.The binaryzation process is: from top to bottom, from left to right scan each pixel in the high threshold bianry image successively, for " 1 " pixel that scans, the pixel of correspondence position in the low threshold value bianry image as seed points, is sought connected component, promptly seek stroke; In the high threshold bianry image, be that seed points is sought connected component simultaneously, and whole pixels of composing in the connected component are 0, to the pixel of these connected components, just can not repeat in the low threshold value bianry image and seek connected component like this with described " 1 " pixel; Continue scanning, in the high threshold bianry image, do not have " 1 " pixel.With reference to Fig. 4 f, be the binaryzation result schematic diagram.
As from the foregoing, described two-stage global threshold method is to hang down the method that threshold value bianry image and high threshold bianry image combine, promptly keep clear, the complete characteristic of low threshold value bianry image, utilized the less characteristic of high threshold bianry image noise to remove noise in the former low threshold value bianry image again.Therefore, described binarization method both can reduce ground unrest, can obtain clear, complete text again.
But also there is noise in described binaryzation result, and the noise that exists between the especially adjacent stroke is not removed fully, therefore also needs to continue denoising.
S304 carries out denoising with described bianry image, removes the noise between the stroke.
In image processing field, use denoising method usually, for example corrosion and expansive working based on mathematical morphology.Though corrosion and expansion can be filled hole and be removed isolated noise spot, also can destroy the shape and the details of target.Especially for the text target, corrosion and expansive working meeting seriously destroy its labyrinth and edge details.
At above-mentioned traditional denoising method, the present invention proposes a kind of denoising method based on rim detection, can remove noise, can keep the labyrinth and the edge details of literal again.With reference to shown in Figure 5, classic method and the present invention have been carried out comparative illustration.Wherein, Fig. 5 a is an embodiment of the invention Central Plains text image synoptic diagram, and Fig. 5 b is based on the denoising result synoptic diagram of condition expansion method, and Fig. 5 c is the denoising result synoptic diagram of the method for the invention.As can be seen from the figure, disposal route of the present invention can keep integrity of shape simultaneously and remove background.
Survey situation about often appearing between the adjacent stroke at flase drop, basic thought of the present invention is the position that utilizes each literal of Template Location of rim detection design of text, thereby the flase drop that shifts out between stroke is surveyed.Detection method is: at first former text image is carried out rim detection, obtain the edge-detected image shown in Fig. 6 a, the hole of surrounded by edges is filled again, thereby obtain desired template, shown in Fig. 6 b; Then described template and above-mentioned two-value stroke pattern are looked like to carry out AND operation, extract the pixel of corresponding templates position in the described two-value stroke pattern picture, obtain removing the text stroke behind the noise, shown in Fig. 6 c; At last, the connected component that filtering is minimum has just obtained the gratifying text segmentation result of quality, shown in Fig. 6 d.
Because the two-value stroke pattern picture that adopts two-stage global threshold method to obtain, noise mainly is present between the stroke, and mainly there is edge noise in described template, be that noise mainly is present in the background, so just in time can extract the pixel of corresponding templates position in the two-value stroke pattern picture by AND operation, thereby when guaranteeing integrity of shape, can remove ground unrest effectively.
Describe this denoising method below in detail,, be the process flow diagram of described method with reference to Fig. 7 based on rim detection, specific as follows:
S701 utilizes the Canny edge detection operator that former text image (a) is carried out rim detection as Fig. 4;
In image processing field, rim detection has a variety of methods, and present embodiment utilizes the Canny edge detection operator to carry out rim detection.The Canny operator is the most effective edge detection operator, and method is described below:
1) image uses the Gaussian filter of standard deviation sigma to come smoothly, thereby reduces noise;
2) calculate partial gradient at the every bit place
Figure GDA0000058122470000091
With edge direction α (x, y)=arctan (G y/ G x), G wherein xAnd G yThe single order partial derivative of representing x and y direction respectively;
3) marginal point of determining in the 2nd can cause ridge occurring in the gradient amplitude image, and then, algorithm is followed the trail of the top of all ridges, and all are not made as 0 in the pixel at the top of ridge, so that provide a fine rule in output, is referred to as non-maximal value and suppresses to handle.The ridge pixel uses two threshold value T1 and T2 to do threshold process, wherein T1<T2.Ridge pixel greater than T2 is called strong edge pixel, and the ridge pixel between T1 and the T2 is called weak edge pixel;
4) last, algorithm is connected to strong edge pixel with the 8 weak edge pixels that connect.
Preferably, S702 carries out the edge to described edge detection results and connects;
It is that point similar in the small neighbourhood is coupled together that the edge connects, and forms an edge.Two main character determining the edge pixel similarity are: the 1) gradient of edge pixel; 2) direction of gradient.In the present embodiment, for the bianry image of Canny rim detection, in a small neighbourhood, if certain two pixel of level, vertical, diagonal line or back-diagonal direction all are 1 pixels, and the direction of gradient then connects greater than predetermined threshold value.
S703 fills the hole of surrounded by edges, obtains template;
It is that the zone of the hole in the image is filled that hole is filled.Equally, the hole fill method also has a variety of, and the hole fill method that present embodiment adopts is as follows:
If A represents a width of cloth and comprises the image of 8 connection borderline regions that purpose is to fill whole hole zone with 1.Method is as follows:
X k = ( X k - 1 ⊕ B ) ∩ A c , k=1,2,3,......
In the formula, X 0Be former bianry image, B is the symmetrical structure element, A cIt is the benefit of A.If X k=X K-1, then algorithm is at the k EOS of iteration.X kComprise the set that is filled and its border with the union of A.
S704 looks like described template and above-mentioned two-value stroke pattern to carry out AND operation, extracts the pixel of corresponding templates position in the two-value stroke pattern picture, can remove ground unrest effectively like this;
Preferably, S705, the connected component that filtering is minimum, i.e. the little noise of filtering obtains final text segmentation result.
At above-mentioned text recognition method, the present invention also provides a kind of embodiment of picture and text piece-rate system.With reference to Fig. 8, be the described picture and text piece-rate system of embodiment structural drawing.Described system mainly comprises modelling unit U801, stroke extraction unit U802, binarization unit U803 and denoising unit U804.Preferably, also comprise pretreatment unit U805.
Wherein, described modelling unit U801 is used for former bilateral model is improved, allow to extract stroke width all strokes in preset range, thereby has solved the problem that master mould can only extract the specified width, which width stroke.Described pretreatment unit U805 is used for described urtext image is carried out the figure image intensifying, thus the edge and the details of outstanding stroke; Present embodiment carries out the figure image intensifying with Laplce's sharpening operator to former line of text.Described stroke extraction unit U802 is used to utilize described improved bilateral model, extracts complete stroke pattern picture from the urtext image.
Described binarization unit U803 is used for the described stroke pattern that extracts is looked like to be converted to bianry image.Adopt two-stage global threshold method that the stroke pattern that extracts is looked like to carry out binary conversion treatment in the present embodiment, obtain two-value stroke pattern picture.According to two-stage global threshold method, described binarization unit U803 further comprises:
Two-stage global threshold subelement is used for the described stroke pattern that extracts is looked like to choose the two-stage threshold value, obtains low accordingly threshold value bianry image and high threshold bianry image;
Connected component extracts subelement, is used for scanning the pixel of high threshold bianry image, when scanning the pixel of stroke correspondence, will hang down the pixel of correspondence position in the threshold value bianry image as seed points; In described low threshold value bianry image, begin to seek connected component from this seed points; Above-mentioned steps is carried out in circulation, and after the pixel in the high threshold bianry image had all been scanned, the connected component that extracts from described low threshold value bianry image was the bianry image after the conversion.
Described denoising unit U804 is used for described bianry image is carried out denoising, mainly removes the noise between the stroke.Present embodiment adopts the denoising method based on rim detection, can remove noise, can keep the labyrinth and the edge details of literal again.Based on described method, described denoising unit U804 further comprises:
Template is set up subelement, is used for described urtext image is carried out rim detection, and carries out the edge connection, the hole of surrounded by edges is filled again, and obtains template; Described template is used to locate the position of each literal.In the present embodiment, described template is set up subelement and is utilized the Canny edge detection operator that former text image is carried out rim detection;
The denoising subelement is used for described template and described bianry image are carried out AND operation, extracts the pixel of corresponding templates position in the described bianry image, obtains removing the text stroke behind the noise.
In sum, system of the present invention is at the uneven stroke of thickness, it intactly can be extracted from the image of complex background or frame of video.And the denoising of described system can be removed ground unrest effectively when guaranteeing integrity of shape.
The part that does not describe in detail in the system shown in Figure 8 can be considered for length referring to the relevant portion of Fig. 1-method shown in Figure 7, is not described in detail in this.
More than to a kind of video OCR image-text separation method provided by the present invention and system, be described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, part in specific embodiments and applications all can change.In sum, this description should not be construed as limitation of the present invention.

Claims (8)

1. a graph separation is characterized in that, comprising:
Bilateral model is improved, make the text stroke that extracts in the predetermined width scope;
Utilize described improved bilateral model, from the urtext image, extract the stroke pattern picture;
The described stroke pattern that extracts is looked like to be converted to bianry image;
Described bianry image is carried out denoising, comprising:
Described urtext image is carried out rim detection, again the hole of surrounded by edges is filled, obtain template;
Described template and described bianry image are carried out AND operation, extract the pixel of corresponding templates position in the described bianry image, obtain removing the text stroke behind the noise.
2. method according to claim 1 is characterized in that, the step that the described stroke pattern that extracts is looked like to be converted to bianry image comprises:
The described stroke pattern that extracts is looked like to choose the two-stage threshold value, obtain low accordingly threshold value bianry image and high threshold bianry image;
Pixel in the scan round high threshold bianry image when scanning the pixel of stroke correspondence, as seed points, in described low threshold value bianry image, begins to seek connected component from this seed points with the pixel of correspondence position in the low threshold value bianry image;
After the pixel in the high threshold bianry image had all been scanned, the connected component that extracts from described low threshold value bianry image was the bianry image after the conversion.
3. method according to claim 2 is characterized in that, when scanning the pixel of stroke correspondence in the high threshold bianry image, also comprises:
Begin described high threshold bianry image, to seek connected component from this pixel, and the pixel of connected component is labeled as scans.
4. method according to claim 1 is characterized in that, described urtext image is carried out after the rim detection, before the hole of surrounded by edges is filled, also comprises:
Testing result is carried out the edge to be connected.
5. method according to claim 1 is characterized in that, extracts before the stroke pattern picture from the urtext image, also comprises:
Described urtext image is carried out the figure image intensifying.
6. a picture and text piece-rate system is characterized in that, comprising:
The modelling unit is used for bilateral model is improved, and makes the text stroke that extracts in the predetermined width scope;
The stroke extraction unit is used to utilize described improved bilateral model, extracts the stroke pattern picture from the urtext image;
Binarization unit is used for the described stroke pattern that extracts is looked like to be converted to bianry image;
The denoising unit is used for described bianry image is carried out denoising, and described denoising unit further comprises:
Template is set up subelement, is used for described urtext image is carried out rim detection, the hole of surrounded by edges is filled again, and obtains template;
The denoising subelement is used for described template and described bianry image are carried out AND operation, extracts the pixel of corresponding templates position in the described bianry image, obtains removing the text stroke behind the noise.
7. method according to claim 6 is characterized in that, described binarization unit further comprises:
Two-stage global threshold subelement is used for the described stroke pattern that extracts is looked like to choose the two-stage threshold value, obtains low accordingly threshold value bianry image and high threshold bianry image;
Connected component extracts subelement, the pixel that is used for scan round high threshold bianry image when scanning the pixel of stroke correspondence, will be hanged down the pixel of correspondence position in the threshold value bianry image as seed points, in described low threshold value bianry image, begin to seek connected component from this seed points; After the pixel in the high threshold bianry image had all been scanned, the connected component that extracts from described low threshold value bianry image was the bianry image after the conversion.
8. method according to claim 6 is characterized in that, also comprises:
Pretreatment unit is used for described urtext image is carried out the figure image intensifying;
Extract the stroke pattern picture the urtext image of described stroke extraction unit after the figure image intensifying.
CN2008101136592A 2008-05-29 2008-05-29 Video OCR image-text separation method and system Expired - Fee Related CN101593276B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101136592A CN101593276B (en) 2008-05-29 2008-05-29 Video OCR image-text separation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101136592A CN101593276B (en) 2008-05-29 2008-05-29 Video OCR image-text separation method and system

Publications (2)

Publication Number Publication Date
CN101593276A CN101593276A (en) 2009-12-02
CN101593276B true CN101593276B (en) 2011-10-12

Family

ID=41407924

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101136592A Expired - Fee Related CN101593276B (en) 2008-05-29 2008-05-29 Video OCR image-text separation method and system

Country Status (1)

Country Link
CN (1) CN101593276B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2471025B1 (en) * 2009-12-31 2019-06-05 Tata Consultancy Services Limited A method and system for preprocessing the region of video containing text
CN102193918B (en) * 2010-03-01 2014-02-12 汉王科技股份有限公司 Video retrieval method and device
CN102456212A (en) * 2010-10-19 2012-05-16 北大方正集团有限公司 Separation method and system for visible watermark in numerical image
CN102567727B (en) * 2010-12-13 2014-01-01 中兴通讯股份有限公司 Method and device for replacing background target
CN102567939A (en) * 2010-12-27 2012-07-11 北大方正集团有限公司 Separating method and device for visible watermark in image
CN102122347B (en) * 2011-03-31 2013-01-09 汉王科技股份有限公司 Method and device for distinguishing polarity of text color in image
CN102163284B (en) * 2011-04-11 2013-02-27 西安电子科技大学 Text positioning method for complex scenes in Chinese environment
CN102779276B (en) * 2011-05-09 2015-05-20 汉王科技股份有限公司 Text image recognition method and device
CN103034854B (en) * 2011-09-30 2016-03-30 富士通株式会社 Image processing apparatus and method
CN103093228B (en) * 2013-01-17 2015-12-09 上海交通大学 A kind of in natural scene image based on the Chinese detection method of connected domain
CN104077593A (en) * 2013-03-27 2014-10-01 富士通株式会社 Image processing method and image processing device
CN104598907B (en) * 2013-10-31 2017-12-05 同济大学 Lteral data extracting method in a kind of image based on stroke width figure
CN104794479B (en) * 2014-01-20 2018-06-29 北京大学 This Chinese detection method of natural scene picture based on the transformation of local stroke width
CN106203434B (en) * 2016-07-08 2019-07-19 中国科学院自动化研究所 Document Image Binarization Method Based on Symmetry of Stroke Structure
CN110363190A (en) * 2019-07-26 2019-10-22 中国工商银行股份有限公司 A kind of character recognition method, device and equipment
CN110807457A (en) * 2019-10-12 2020-02-18 浙江大华技术股份有限公司 OSD character recognition method, device and storage device
US11386687B2 (en) 2020-03-30 2022-07-12 Wipro Limited System and method for reconstructing an image

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5228095A (en) * 1991-02-26 1993-07-13 Sony Corporation Apparatus for recognizing printed characters
CN1430764A (en) * 2000-05-22 2003-07-16 国际商业机器公司 Finding objects in image

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5228095A (en) * 1991-02-26 1993-07-13 Sony Corporation Apparatus for recognizing printed characters
CN1430764A (en) * 2000-05-22 2003-07-16 国际商业机器公司 Finding objects in image

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JP特开2007-172639A 2007.07.05
张重阳."票据自动处理系统中的预处理技术研究".《票据自动处理系统中的预处理技术研究》.2004,31-33. *

Also Published As

Publication number Publication date
CN101593276A (en) 2009-12-02

Similar Documents

Publication Publication Date Title
CN101593276B (en) Video OCR image-text separation method and system
US8965123B2 (en) System and method for processing image for identifying alphanumeric characters present in a series
KR101403876B1 (en) Method and Apparatus for Vehicle License Plate Recognition
CN102208023B (en) Method for recognizing and designing video captions based on edge information and distribution entropy
LeBourgeois Robust multifont OCR system from gray level images
KR101717613B1 (en) The moving vehicle detection system using an object tracking algorithm based on edge information, and method thereof
Kaur et al. An efficient approach for number plate extraction from vehicles image under image processing
US20140193029A1 (en) Text Detection in Images of Graphical User Interfaces
Khodadadi et al. Text localization, extraction and inpainting in color images
Jamil et al. Edge-based features for localization of artificial Urdu text in video images
Shirai et al. Removal of background patterns and signatures for magnetic ink character recognition of checks
Gilly et al. A survey on license plate recognition systems
He et al. A new automatic extraction method of container identity codes
Rege et al. Text-image separation in document images using boundary/perimeter detection
Paunwala et al. A novel multiple license plate extraction technique for complex background in Indian traffic conditions
Aung et al. Automatic license plate detection system for myanmar vehicle license plates
Agrawal et al. Stroke-like pattern noise removal in binary document images
Zhan et al. A robust split-and-merge text segmentation approach for images
Shi et al. Image enhancement for degraded binary document images
Ntirogiannis et al. Binarization of textual content in video frames
Zhang et al. A novel approach for binarization of overlay text
Satish et al. Edge assisted fast binarization scheme for improved vehicle license plate recognition
Nomura et al. A new method for degraded color image binarization based on adaptive lightning on grayscale versions
Sathya et al. Vehicle license plate recognition (vlpr)
Dandu et al. Vehicular number plate recognition using edge detection and characteristic analysis of national number plates

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20111012

Termination date: 20170529

CF01 Termination of patent right due to non-payment of annual fee