CN101593276B

CN101593276B - Video OCR image-text separation method and system

Info

Publication number: CN101593276B
Application number: CN2008101136592A
Authority: CN
Inventors: 禹晶; 黄磊; 刘昌平
Original assignee: Hanwang Technology Co Ltd
Current assignee: Hanwang Technology Co Ltd
Priority date: 2008-05-29
Filing date: 2008-05-29
Publication date: 2011-10-12
Anticipated expiration: 2028-05-29
Also published as: CN101593276A

Abstract

The invention discloses a video Optical Character Recognition (OCR) image-text separation method and a system thereof to solve the problem that the current recognition method by using a bilateral model can not extract intact strokes in the case of uneven thickness of strokes. The methods comprises: the bilateral model is improved, so that text strokes with the width within the preset range is extracted; the improved bilateral model is used for extracting stroke images from an original text image; the extracted stroke images are converted into binary images; and the binary images are treated by de-noising processing. The invention can extract all the strokes with the stroke width within the preset range, and during the image post-processing stage, a new method combining of two-level overall threshold and edge detection is provided, which can be effective in eliminating background noise while maintaining the integrity of strokes.

Description

A kind of video OCR image-text separation method and system

Technical field

The present invention relates to image processing field, particularly relate to a kind of video OCR image-text separation method and system.

Background technology

OCR is the abbreviation of English Optical Character Recognition, and the meaning is an optical character identification, also can be called literal identification simply, is literal a kind of method of input automatically.It obtains character image information on the paper by scanning and optics input mode such as shooting, utilizes various algorithm for pattern recognitions to analyze the literal morphological features, judges the standard code of Chinese character, and is stored in the text by general format.

Videotext identification (Video OCR) is an emerging research direction of video analysis and application, is an important research direction towards the intelligent vision Flame Image Process of man-machine interaction.Video OCR is the technology that the text in the video is detected, follows the tracks of, extracts, discerns and retrieves, and this technology can extract literal in video pictures, and converts it into editable electronics literary composition file.The information that literal comprised in the video has important value for the interior perhaps semanteme of understanding video, along with digital video in the application of every field more and more widely, the extraction of video information, retrieval, inquiring technology are more and more important, and the research of Video OCR also becomes focus gradually.

In the videotext identifying, it is critical step wherein that picture and text separate, and is that text character is extracted from video pictures or complex background, so that the OCR engine converts it into editable electronics literary composition file.At present, the researchist has proposed the method that a variety of picture and text separate, as global threshold method, local threshold method, based on characterization method, color cluster method, stroke modeling method, or the like.Wherein, described stroke modeling method utilizes this model to extract character stroke from video pictures by setting up the model of character stroke.The model that this method is used is based on the bilateral model of stroke, described bilateral model description the local feature of character stroke, be applied to from cheque image, extract under the different complex backgrounds handwritten text.

There is following shortcoming in the stroke modeling method that adopts bilateral model in the picture and text detachment process:

Because described bilateral model is relatively more responsive to the width of stroke, can only discern processing to the stroke of specified width, which width, therefore, described method can only be extracted character under the prerequisite of known stroke width, if the stroke weight of character is irregular, then can not extract complete character stroke.

Summary of the invention

Technical matters to be solved by this invention provides a kind of video OCR image-text separation method and system, to solve the existing recognition methods of adopting bilateral model under the situation of stroke weight inequality, can't extract the problem of complete stroke.

For solving the problems of the technologies described above,, the invention discloses following technical scheme according to specific embodiment provided by the invention:

A kind of graph separation comprises:

Bilateral model is improved, make the text stroke that extracts in the predetermined width scope;

Utilize described improved bilateral model, from the urtext image, extract the stroke pattern picture;

The described stroke pattern that extracts is looked like to be converted to bianry image;

Described bianry image is carried out denoising, comprising: described urtext image is carried out rim detection, again the hole of surrounded by edges is filled, obtain template; Described template and described bianry image are carried out AND operation, extract the pixel of corresponding templates position in the described bianry image, obtain removing the text stroke behind the noise.

Preferably, the step that the described stroke pattern that extracts is looked like to be converted to bianry image comprises: the described stroke pattern that extracts is looked like to choose the two-stage threshold value, obtain low accordingly threshold value bianry image and high threshold bianry image; Pixel in the scan round high threshold bianry image when scanning the pixel of stroke correspondence, will be hanged down the pixel of correspondence position in the threshold value bianry image as seed points; In described low threshold value bianry image, begin to seek connected component from this seed points; After the pixel in the high threshold bianry image had all been scanned, the connected component that extracts from described low threshold value bianry image was the bianry image after the conversion.

Preferably, when in the high threshold bianry image, scanning the pixel of stroke correspondence, also comprise: begin described high threshold bianry image, to seek connected component from this pixel, and the pixel of connected component is labeled as scans.

Preferably, described urtext image is carried out after the rim detection, before the hole of surrounded by edges is filled, also comprise: testing result is carried out the edge connect.

Preferably, from the urtext image, extract before the stroke pattern picture, also comprise: described urtext image is carried out the figure image intensifying.

A kind of picture and text piece-rate system comprises:

The modelling unit is used for bilateral model is improved, and makes the text stroke that extracts in the predetermined width scope;

The stroke extraction unit is used to utilize described improved bilateral model, extracts the stroke pattern picture from the urtext image;

Binarization unit is used for the described stroke pattern that extracts is looked like to be converted to bianry image;

The denoising unit is used for described bianry image is carried out denoising, and described denoising unit comprises that further template sets up subelement, is used for described urtext image is carried out rim detection, the hole of surrounded by edges is filled again, and obtains template; The denoising subelement is used for described template and described bianry image are carried out AND operation, extracts the pixel of corresponding templates position in the described bianry image, obtains removing the text stroke behind the noise.

Preferably, described binarization unit further comprises: two-stage global threshold subelement, be used for the described stroke pattern that extracts is looked like to choose the two-stage threshold value, and obtain low accordingly threshold value bianry image and high threshold bianry image; Connected component extracts subelement, the pixel that is used for scan round high threshold bianry image when scanning the pixel of stroke correspondence, will be hanged down the pixel of correspondence position in the threshold value bianry image as seed points, in described low threshold value bianry image, begin to seek connected component from this seed points; After the pixel in the high threshold bianry image had all been scanned, the connected component that extracts from described low threshold value bianry image was the bianry image after the conversion.

Preferably, described system also comprises: pretreatment unit is used for described urtext image is carried out the figure image intensifying; Extract the stroke pattern picture the urtext image of described stroke extraction unit after the figure image intensifying.

According to specific embodiment provided by the invention, the present invention has following technique effect:

At first, the present invention improves bilateral model, make the bilateral model after the improvement can extract stroke width all strokes in preset range, thereby has solved the problem that master mould can only extract the specified width, which width stroke.Therefore, at the uneven stroke of thickness, the present invention also can intactly extract it from the image of complex background or frame of video.

Secondly, in the post-processing stages of image, the present invention proposes a kind of new method in conjunction with two-stage global threshold and rim detection, the integrality of denoising simultaneously and maintenance stroke.At first use two-stage global threshold method that the stroke pattern that extracts is looked like to carry out binary conversion treatment, obtain two-value stroke pattern picture; The method that adopts rim detection and hole to fill is then set up a template, is used to locate the position of each literal; At last described template and described two-value stroke pattern are looked like to carry out AND operation, obtain clean, complete text stroke.Because the two-value stroke pattern picture that adopts two-stage global threshold method to obtain, noise mainly is present between the stroke, and mainly there is edge noise in described template, described AND operation can extract the pixel of corresponding templates position in the two-value stroke pattern picture, therefore, when guaranteeing integrity of shape, can remove ground unrest effectively.

Description of drawings

Fig. 1 is the bilateral model synoptic diagram on 1-D in the prior art;

Fig. 2 is the bilateral model synoptic diagram on 1-D among the present invention;

Fig. 3 is a kind of graph separation process flow diagram that the embodiment of the invention provides;

Fig. 4 a is an embodiment of the invention Central Plains text image synoptic diagram;

Fig. 4 b is that laplacian image strengthens synoptic diagram in the embodiment of the invention;

Fig. 4 c is that the stroke pattern that extracts in the embodiment of the invention is as synoptic diagram;

Fig. 4 d is a low threshold value bianry image synoptic diagram in the embodiment of the invention;

Fig. 4 e is a high threshold bianry image synoptic diagram in the embodiment of the invention;

Fig. 4 f is a binaryzation result schematic diagram in the embodiment of the invention;

Fig. 5 a is an embodiment of the invention Central Plains text image synoptic diagram, and Fig. 5 b is based on the denoising result synoptic diagram of condition expansion method, and Fig. 5 c is the denoising result synoptic diagram of the method for the invention;

Fig. 6 a is an edge-detected image synoptic diagram in the embodiment of the invention;

Fig. 6 b is that hole is filled synoptic diagram in the embodiment of the invention;

Fig. 6 c be Fig. 4 f and Fig. 6 b in the embodiment of the invention " with " the image synoptic diagram;

Fig. 6 d is a final segmentation result synoptic diagram in the embodiment of the invention;

Fig. 7 is based on the denoising method process flow diagram of rim detection in the embodiment of the invention;

Fig. 8 is a kind of picture and text piece-rate system structural drawing that the embodiment of the invention provides.

Embodiment

For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.

In the picture and text detachment process, the present invention improves existing bilateral model, make the bilateral model after the improvement can extract stroke width all strokes in preset range, thereby has solved the problem that master mould can only extract the specified width, which width stroke.The present invention also is applicable to and extracts the text stroke from the image of complex background, describes in detail below.

At first introduce existing bilateral model.

In actual applications, in order to make the text readability more in frame of video or the complex background image, design a profile of distinguishing with background around the text stroke, be referred to as bilateral structure in this feature of text and background border, reference is shown in Figure 1.The width of bilateral structure is stroke width.

Suppose bright stroke on dark background, width and contrast are two key characters of character (as Chinese character) stroke.On 1-D, width is that the stroke intensity of W is measured according to following formula:

DE (x) = f (x) - \min_{i - 1}^{W - 1} {\max (f (x - i), f (x - i + W))} - - - (1)

Wherein, former text image uses DE (x) to represent with f (x) expression, stroke pattern picture.Corresponding diagram 1, W is default stroke width, and the width of bilateral structure is actual stroke width.

On 2-D, stroke intensity is measured according to following formula:

{DE}_{W} (x, y) = \max_{d = 0}^{3} {{DE}_{d} (x, y)} - - - (2)

Following formula on the occasion of the existence that shows bright stroke on the dark background.In the formula, d=0,1,2,3 expression levels, vertical, diagonal line and back-diagonal direction, promptly 0, π/4, pi/2,3 π/4}.Therefore, the stroke pattern picture can be expressed as

S_{W} (x, y) = {DE}_{W}^{+} (x, y) - - - (3)

This model has limited it to the susceptibility of stroke width can only extract character under the prerequisite of known stroke width, and can only extract the stroke of specified width, which width.Yet, in more practical application, can not obtain the width of stroke in advance.

Based on above reason, the improved bilateral model of the present invention is defined as follows:

DE (x) = f (x) - \min_{w = W_{i}}^{W_{h}} {\min_{i = 1}^{w - 1} {\max (f (x - i), f (x - i + w))}} - - - (4)

Wherein, W _lAnd W _hRepresent default minimum and maximum stroke width respectively.On 1-D, described bilateral model synoptic diagram as shown in Figure 2, when w equaled actual stroke width, stroke intensity reached peak response.In this bilateral model, w travels through [W _l, W _h] interval all interior stroke widths, thereby make the text stroke obtain maximum response.Yet for existing bilateral model shown in Figure 1, if preset stroke width W less than actual stroke width, stroke will be by omission; If default stroke width W is greater than actual stroke width, when background color was close with textcolor, stroke also can not detect.

On 2-D, stroke intensity can be expressed as

{DE}_{W} (x, y) = \max_{d = 0}^{3} {{DE}_{d} (x, y)} - - - (5)

Wherein, d=0,1,2,3 represent level, vertical, diagonal line and back-diagonal four direction respectively.This model can extract stroke width at [W _l, W _h] all strokes of scope, therefore, this model can guarantee the integrality of character well.Like this, if stroke weight is inhomogeneous, the present invention also can intactly extract it from the image of complex background or frame of video.

The method of utilizing above-mentioned improved bilateral model to extract character stroke is as follows:

Though said process can extract complete character stroke, but this model can extract connected component alike with stroke between the stroke mistakenly, therefore the text that utilizes this model to extract also exists noise to need to remove between stroke, and this process is called the post processing of image stage.Wherein, described connected component promptly refers to the noise between the stroke, as the unnecessary stroke between " discipline " word stroke among Fig. 1.

The denoising method that present embodiment adopts is: at first use two-stage global threshold method that the stroke pattern that extracts is looked like to carry out binary conversion treatment, obtain two-value stroke pattern picture; The method that adopts rim detection and hole to fill is then set up a template, is used to locate the position of each literal; At last described template and described two-value stroke pattern are looked like to carry out AND operation, obtain clean, complete text stroke.

Below in conjunction with the complete text recognition method of example in detail.

With reference to Fig. 3, be a kind of graph separation process flow diagram that the embodiment of the invention provides.This method can extract clear, complete text from video or complicated image, specific as follows:

S301 carries out image enhancement processing to former line of text.

This step is preferred pre-treatment step, and purpose is the edge and the details of outstanding stroke.The figure image intensifying is meant some information of strengthening image selectively, as edge, profile, contrast etc., so that follow-up processing and analysis.

In image processing field, the method for figure image intensifying has a variety of, as image enchancing method based on mathematical morphology, and based on the image enchancing method of rough set theory, or the like.Present embodiment utilizes Laplce's sharpening operator that former line of text is carried out the figure image intensifying.Laplace operator is based on the figure image intensifying operator of second-order differential, has isotropic characteristics, i.e. the sudden change orientation independent of the response of wave filter and image.Use Laplace transform can be expressed as following formula to the basic skills of figure image intensifying:

g(x，y)＝5f(x，y)-[f(x+1，y)+f(x-1，y)+f(x，y+1)+f(x，y-1)] (6)

With reference to Fig. 4, Fig. 4 a is former text image synoptic diagram, and Fig. 4 b is that laplacian image strengthens synoptic diagram.As can be seen from the figure, strengthened the contrast of gray scale sudden change place in the image, the detail section in the image is enhanced by Laplace transform.

S302 uses improved bilateral model to extract the text stroke from video or image.

Utilize preceding method to extract stroke, the stroke pattern that obtains similarly is a gray-scale map.With reference to Fig. 4 c, be the stroke pattern that extracts as synoptic diagram, the text message that extracts is " he say you this age ", but has background image and a large amount of noises in this image, the denoising process is as follows.

S303 looks like to be converted to bianry image with the described stroke pattern that extracts, thereby removes background image.

The purpose of binaryzation is to remove background image, and present embodiment adopts two-stage global threshold method that the stroke gray-scale map is converted to bianry image.The global threshold method is a kind of threshold segmentation method, and thresholding method is a kind of image Segmentation Technology based on the zone, and its ultimate principle is: by setting different characteristic threshold value, the image slices vegetarian refreshments is divided into some classes.If image pixel is divided into two parts of black and white, then the result of image segmentation is exactly a bianry image, and this process is called image binaryzation.The global threshold method is meant utilizes global information that entire image is obtained the optimum segmentation threshold value, can be single threshold, also can be many threshold values.

Two-stage global threshold method is a kind of threshold segmentation method based on the global threshold method, and it is as follows that present embodiment utilizes two-stage global threshold method to carry out the process of image binaryzation:

At first, select two segmentation thresholds respectively at a width of cloth stroke pattern picture, one is low threshold value, and segmentation result is low threshold value bianry image; Another is a high threshold, and segmentation result is the high threshold bianry image.With reference to Fig. 4 d and Fig. 4 e, be respectively the synoptic diagram of low threshold value bianry image and high threshold bianry image.As can be seen from the figure, the text in the low threshold value bianry image is clearer, complete, but has comprised more background parts; And the noise in the high threshold bianry image is less, but text has more incompleteness.

Present embodiment is based on described low threshold value of Ostu selection of threshold and high threshold, chooses α Ostu threshold value doubly as low threshold value, and (the Ostu threshold value doubly of β＞α) is as high threshold to choose β.The Ostu-maximum between-cluster variance is to be proposed in 1979 by big Tianjin of Japanese scholar (Ostu), is the method that a kind of self-adapting threshold is determined, is big Tianjin method again, is called for short Ostu.The Ostu algorithm can be described as the simple high efficiency method that self-adaptation is calculated single threshold (being used for changing gray level image is bianry image).This algorithm is analyzed the histogram of the gray level image of input, and histogram is divided into two parts, makes that the distance between two parts is maximum, and division points is exactly the threshold value of trying to achieve.

Secondly, the pixel in the scanning high threshold bianry image when scanning the pixel of stroke correspondence, will be hanged down the pixel of correspondence position in the threshold value bianry image as seed points; In described low threshold value bianry image, begin to seek connected component from this seed points; Above-mentioned steps is carried out in circulation, and after the pixel in the high threshold bianry image had all been scanned, the connected component that extracts from described low threshold value bianry image was the bianry image after the conversion.

Preferably, when in the high threshold bianry image, scanning the pixel of stroke correspondence, begin described high threshold bianry image, to seek connected component, and the pixel of connected component is labeled as scans from this pixel.

Illustrate, at the example shown in Fig. 4 d and the 4e, the background of image is a black, and corresponding pixel value is 0, and the text stroke in the image is a white, and corresponding pixel value is 1.The binaryzation process is: from top to bottom, from left to right scan each pixel in the high threshold bianry image successively, for " 1 " pixel that scans, the pixel of correspondence position in the low threshold value bianry image as seed points, is sought connected component, promptly seek stroke; In the high threshold bianry image, be that seed points is sought connected component simultaneously, and whole pixels of composing in the connected component are 0, to the pixel of these connected components, just can not repeat in the low threshold value bianry image and seek connected component like this with described " 1 " pixel; Continue scanning, in the high threshold bianry image, do not have " 1 " pixel.With reference to Fig. 4 f, be the binaryzation result schematic diagram.

As from the foregoing, described two-stage global threshold method is to hang down the method that threshold value bianry image and high threshold bianry image combine, promptly keep clear, the complete characteristic of low threshold value bianry image, utilized the less characteristic of high threshold bianry image noise to remove noise in the former low threshold value bianry image again.Therefore, described binarization method both can reduce ground unrest, can obtain clear, complete text again.

But also there is noise in described binaryzation result, and the noise that exists between the especially adjacent stroke is not removed fully, therefore also needs to continue denoising.

S304 carries out denoising with described bianry image, removes the noise between the stroke.

In image processing field, use denoising method usually, for example corrosion and expansive working based on mathematical morphology.Though corrosion and expansion can be filled hole and be removed isolated noise spot, also can destroy the shape and the details of target.Especially for the text target, corrosion and expansive working meeting seriously destroy its labyrinth and edge details.

At above-mentioned traditional denoising method, the present invention proposes a kind of denoising method based on rim detection, can remove noise, can keep the labyrinth and the edge details of literal again.With reference to shown in Figure 5, classic method and the present invention have been carried out comparative illustration.Wherein, Fig. 5 a is an embodiment of the invention Central Plains text image synoptic diagram, and Fig. 5 b is based on the denoising result synoptic diagram of condition expansion method, and Fig. 5 c is the denoising result synoptic diagram of the method for the invention.As can be seen from the figure, disposal route of the present invention can keep integrity of shape simultaneously and remove background.

Survey situation about often appearing between the adjacent stroke at flase drop, basic thought of the present invention is the position that utilizes each literal of Template Location of rim detection design of text, thereby the flase drop that shifts out between stroke is surveyed.Detection method is: at first former text image is carried out rim detection, obtain the edge-detected image shown in Fig. 6 a, the hole of surrounded by edges is filled again, thereby obtain desired template, shown in Fig. 6 b; Then described template and above-mentioned two-value stroke pattern are looked like to carry out AND operation, extract the pixel of corresponding templates position in the described two-value stroke pattern picture, obtain removing the text stroke behind the noise, shown in Fig. 6 c; At last, the connected component that filtering is minimum has just obtained the gratifying text segmentation result of quality, shown in Fig. 6 d.

Because the two-value stroke pattern picture that adopts two-stage global threshold method to obtain, noise mainly is present between the stroke, and mainly there is edge noise in described template, be that noise mainly is present in the background, so just in time can extract the pixel of corresponding templates position in the two-value stroke pattern picture by AND operation, thereby when guaranteeing integrity of shape, can remove ground unrest effectively.

Describe this denoising method below in detail,, be the process flow diagram of described method with reference to Fig. 7 based on rim detection, specific as follows:

S701 utilizes the Canny edge detection operator that former text image (a) is carried out rim detection as Fig. 4;

In image processing field, rim detection has a variety of methods, and present embodiment utilizes the Canny edge detection operator to carry out rim detection.The Canny operator is the most effective edge detection operator, and method is described below:

1) image uses the Gaussian filter of standard deviation sigma to come smoothly, thereby reduces noise;

2) calculate partial gradient at the every bit place

With edge direction α (x, y)=arctan (G _y/ G _x), G wherein _xAnd G _yThe single order partial derivative of representing x and y direction respectively;

3) marginal point of determining in the 2nd can cause ridge occurring in the gradient amplitude image, and then, algorithm is followed the trail of the top of all ridges, and all are not made as 0 in the pixel at the top of ridge, so that provide a fine rule in output, is referred to as non-maximal value and suppresses to handle.The ridge pixel uses two threshold value T1 and T2 to do threshold process, wherein T1＜T2.Ridge pixel greater than T2 is called strong edge pixel, and the ridge pixel between T1 and the T2 is called weak edge pixel;

4) last, algorithm is connected to strong edge pixel with the 8 weak edge pixels that connect.

Preferably, S702 carries out the edge to described edge detection results and connects;

It is that point similar in the small neighbourhood is coupled together that the edge connects, and forms an edge.Two main character determining the edge pixel similarity are: the 1) gradient of edge pixel; 2) direction of gradient.In the present embodiment, for the bianry image of Canny rim detection, in a small neighbourhood, if certain two pixel of level, vertical, diagonal line or back-diagonal direction all are 1 pixels, and the direction of gradient then connects greater than predetermined threshold value.

S703 fills the hole of surrounded by edges, obtains template;

It is that the zone of the hole in the image is filled that hole is filled.Equally, the hole fill method also has a variety of, and the hole fill method that present embodiment adopts is as follows:

If A represents a width of cloth and comprises the image of 8 connection borderline regions that purpose is to fill whole hole zone with 1.Method is as follows:

X_{k} = (X_{k - 1} &CirclePlus; B) \cap A^{c},

k＝1，2，3，......

In the formula, X ₀Be former bianry image, B is the symmetrical structure element, A ^cIt is the benefit of A.If X _k=X _K-1, then algorithm is at the k EOS of iteration.X _kComprise the set that is filled and its border with the union of A.

S704 looks like described template and above-mentioned two-value stroke pattern to carry out AND operation, extracts the pixel of corresponding templates position in the two-value stroke pattern picture, can remove ground unrest effectively like this;

Preferably, S705, the connected component that filtering is minimum, i.e. the little noise of filtering obtains final text segmentation result.

At above-mentioned text recognition method, the present invention also provides a kind of embodiment of picture and text piece-rate system.With reference to Fig. 8, be the described picture and text piece-rate system of embodiment structural drawing.Described system mainly comprises modelling unit U801, stroke extraction unit U802, binarization unit U803 and denoising unit U804.Preferably, also comprise pretreatment unit U805.

Wherein, described modelling unit U801 is used for former bilateral model is improved, allow to extract stroke width all strokes in preset range, thereby has solved the problem that master mould can only extract the specified width, which width stroke.Described pretreatment unit U805 is used for described urtext image is carried out the figure image intensifying, thus the edge and the details of outstanding stroke; Present embodiment carries out the figure image intensifying with Laplce's sharpening operator to former line of text.Described stroke extraction unit U802 is used to utilize described improved bilateral model, extracts complete stroke pattern picture from the urtext image.

Described binarization unit U803 is used for the described stroke pattern that extracts is looked like to be converted to bianry image.Adopt two-stage global threshold method that the stroke pattern that extracts is looked like to carry out binary conversion treatment in the present embodiment, obtain two-value stroke pattern picture.According to two-stage global threshold method, described binarization unit U803 further comprises:

Two-stage global threshold subelement is used for the described stroke pattern that extracts is looked like to choose the two-stage threshold value, obtains low accordingly threshold value bianry image and high threshold bianry image;

Connected component extracts subelement, is used for scanning the pixel of high threshold bianry image, when scanning the pixel of stroke correspondence, will hang down the pixel of correspondence position in the threshold value bianry image as seed points; In described low threshold value bianry image, begin to seek connected component from this seed points; Above-mentioned steps is carried out in circulation, and after the pixel in the high threshold bianry image had all been scanned, the connected component that extracts from described low threshold value bianry image was the bianry image after the conversion.

Described denoising unit U804 is used for described bianry image is carried out denoising, mainly removes the noise between the stroke.Present embodiment adopts the denoising method based on rim detection, can remove noise, can keep the labyrinth and the edge details of literal again.Based on described method, described denoising unit U804 further comprises:

Template is set up subelement, is used for described urtext image is carried out rim detection, and carries out the edge connection, the hole of surrounded by edges is filled again, and obtains template; Described template is used to locate the position of each literal.In the present embodiment, described template is set up subelement and is utilized the Canny edge detection operator that former text image is carried out rim detection;

The denoising subelement is used for described template and described bianry image are carried out AND operation, extracts the pixel of corresponding templates position in the described bianry image, obtains removing the text stroke behind the noise.

In sum, system of the present invention is at the uneven stroke of thickness, it intactly can be extracted from the image of complex background or frame of video.And the denoising of described system can be removed ground unrest effectively when guaranteeing integrity of shape.

The part that does not describe in detail in the system shown in Figure 8 can be considered for length referring to the relevant portion of Fig. 1-method shown in Figure 7, is not described in detail in this.

More than to a kind of video OCR image-text separation method provided by the present invention and system, be described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, part in specific embodiments and applications all can change.In sum, this description should not be construed as limitation of the present invention.

Claims

1. a graph separation is characterized in that, comprising:

Described bianry image is carried out denoising, comprising:

Described urtext image is carried out rim detection, again the hole of surrounded by edges is filled, obtain template;

Described template and described bianry image are carried out AND operation, extract the pixel of corresponding templates position in the described bianry image, obtain removing the text stroke behind the noise.

2. method according to claim 1 is characterized in that, the step that the described stroke pattern that extracts is looked like to be converted to bianry image comprises:

The described stroke pattern that extracts is looked like to choose the two-stage threshold value, obtain low accordingly threshold value bianry image and high threshold bianry image;

Pixel in the scan round high threshold bianry image when scanning the pixel of stroke correspondence, as seed points, in described low threshold value bianry image, begins to seek connected component from this seed points with the pixel of correspondence position in the low threshold value bianry image;

After the pixel in the high threshold bianry image had all been scanned, the connected component that extracts from described low threshold value bianry image was the bianry image after the conversion.

3. method according to claim 2 is characterized in that, when scanning the pixel of stroke correspondence in the high threshold bianry image, also comprises:

Begin described high threshold bianry image, to seek connected component from this pixel, and the pixel of connected component is labeled as scans.

4. method according to claim 1 is characterized in that, described urtext image is carried out after the rim detection, before the hole of surrounded by edges is filled, also comprises:

Testing result is carried out the edge to be connected.

5. method according to claim 1 is characterized in that, extracts before the stroke pattern picture from the urtext image, also comprises:

Described urtext image is carried out the figure image intensifying.

6. a picture and text piece-rate system is characterized in that, comprising:

The denoising unit is used for described bianry image is carried out denoising, and described denoising unit further comprises:

Template is set up subelement, is used for described urtext image is carried out rim detection, the hole of surrounded by edges is filled again, and obtains template;

7. method according to claim 6 is characterized in that, described binarization unit further comprises:

Connected component extracts subelement, the pixel that is used for scan round high threshold bianry image when scanning the pixel of stroke correspondence, will be hanged down the pixel of correspondence position in the threshold value bianry image as seed points, in described low threshold value bianry image, begin to seek connected component from this seed points; After the pixel in the high threshold bianry image had all been scanned, the connected component that extracts from described low threshold value bianry image was the bianry image after the conversion.

8. method according to claim 6 is characterized in that, also comprises:

Pretreatment unit is used for described urtext image is carried out the figure image intensifying;

Extract the stroke pattern picture the urtext image of described stroke extraction unit after the figure image intensifying.