CN119763090B

CN119763090B - Movie subtitle extraction method and device under background complex change scene based on OCR and color preprocessing

Info

Publication number: CN119763090B
Application number: CN202411815294.3A
Authority: CN
Inventors: 周晟; 吴雨轩; 卜佳俊; 沈铭; 李亮城
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2024-12-11
Filing date: 2024-12-11
Publication date: 2025-09-26
Anticipated expiration: 2044-12-11
Also published as: CN119763090A

Abstract

The invention discloses a method and a device for extracting movie subtitles under a background complex change scene based on OCR and color preprocessing, wherein the method extracts the color information characteristics of the movie subtitles by intercepting and locating the positions of the movie subtitles in advance, extracts subtitle texts according to the subtitle color preprocessing, improves and optimizes the poor subtitle extraction effect caused by the reasons of confusion of background elements, color change and the like in the traditional movie subtitle extraction method. The method is beneficial to improving the recognition precision of OCR in the movie scene, and overcomes the challenges brought by background complexity while ensuring the efficiency.

Description

Movie subtitle extraction method and device under background complex change scene based on OCR and color preprocessing

Technical field:

the invention relates to an optical character recognition technology, in particular to a film subtitle extraction method under a complex background change scene based on OCR and specific color preprocessing.

The background technology is as follows:

In text recognition techniques for movie titles, there are often relatively complex scene changes. Movie scenes often contain rich and varied background elements such as fast moving objects, light changes, special effects, etc. The strong shadow effect may directly interfere with the definition of the subtitles, which factors make the subtitle extraction in the movie exceptionally complex.

Traditional movie subtitle extraction methods are based solely on Optical Character Recognition (OCR) technology, but often perform poorly because subtitles may be confused by background elements and even affected by color changes and are difficult to accurately recognize. It is therefore difficult to accurately extract subtitles in a movie scene using simple optical character recognition techniques.

The above problems are currently in need of solution.

The invention comprises the following steps:

The invention aims to overcome the defects of the prior art, and provides a method and a device for extracting movie captions in a scene with complex background changes based on OCR and color preprocessing based on the advantages of OCR and preprocessing.

In order to solve the technical problems, the method for extracting the movie subtitle under the background complex change scene based on OCR and color preprocessing comprises the following steps:

s110, sampling movie frames to obtain specific positions of text frames, randomly sampling a plurality of picture frames from the specific positions of the text frames, and obtaining all the positions of the text frames for the picture frames by an OCR method;

S120, acquiring caption position information by using a clustering algorithm, clustering the text frames to obtain the class with the largest number in the class, taking the central point coordinate of the class as the central point of the caption position, counting the mode of the height of the text frames in the class of the central point coordinate, taking the width of the film as the height of the caption position, and taking the width of the caption position;

S130, calculating subtitle color features according to the pixel point modes;

and S140, extracting caption text according to caption color preprocessing.

Further, the step S110 of sampling the movie frame to obtain the specific position of the text frame specifically includes:

s1101, for a film, randomly sampling n picture frames from the film;

S1102, for n picture frames, calculating the positions of all the text frames by an OCR method, wherein the number of the text frames is S _n;

s1103, the specific position of the text frame is represented by { X, Y, H }, X, Y is the center point coordinate of the text frame, and H is the height of the text frame.

Further, the step S120 of obtaining subtitle position information by using a clustering algorithm specifically includes:

S1201, carrying out Kmeans clustering on the S _n text frame positions X, Y to obtain the most numerous classes in the classes, and taking the central point coordinate of the class as the central point of the subtitle position;

s1202, counting the mode of the height H of the text frame in the class of the coordinates of the affiliated central point, and taking the mode as the height of the subtitle position;

S1203 sets the width of the movie as the width of the subtitle position.

Further, the step S130 of calculating the subtitle color feature according to the pixel mode specifically includes:

s1301, capturing a movie picture by utilizing a subtitle position according to the picture frame sampled in the S1101, and representing a pixel value of the captured picture by using Img _i,j;

S1302, counting the most number of pixel values in each column of pixels in the intercepted picture, namely, the mode of the pixel values in each column, wherein C _j is used for representing the mode of the pixel values in each column;

S1303, calculating the mode in { C ₁,…,C_j,…,C_w }, and recording as M, wherein w is the number of columns of picture pixels;

S1304, repeating the above operation for all the truncated picture frames, and counting the pixel value mode in { M } as the caption color information feature.

Further, in the step S130, the subtitle color feature storage format is required to be in RGB three-channel mode.

Further, the step S140 of extracting the subtitle text according to the subtitle color preprocessing specifically includes:

S1401, extracting all picture frames in a film, and cutting out pictures according to subtitle positions;

S1402, binarizing the intercepted picture, setting 255 as the pixel value equal to the subtitle color feature, and setting 0 as the pixel value unequal to the subtitle color feature;

s1403, for the picture after binarization operation, the picture is sent to OCR for subtitle text extraction.

A second aspect of the present invention relates to a device for extracting movie titles in a background complex-change scene based on OCR and color preprocessing, which comprises a memory and one or more processors, wherein executable codes are stored in the memory, and the one or more processors are used for implementing the method for extracting movie titles in a background complex-change scene based on OCR and color preprocessing according to the present invention when the executable codes are executed.

A third aspect of the present invention relates to a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the method for extracting movie titles in a complex background changing scene based on OCR and color preprocessing as set forth in any one of claims 1 to 6.

The subtitle extraction method based on the OCR and color preprocessing technology has the beneficial effects that the subtitle extraction method based on the OCR and color preprocessing technology in the complex changing scene of the movie background is used for counting all text frames in a picture frame through the OCR technology, and clustering and mode operation are carried out on the text frame information to determine the subtitle position. According to the sequence of column statistics, row statistics and film frame statistics, the color characteristics of the subtitles are defined, the color information is preprocessed by binarization operation and then is put into an OCR technology to extract the subtitles, the problem that the existing pure OCR technology cannot identify the complex scene change in the film subtitles is solved, and therefore the accuracy of extracting the film subtitles is improved.

Description of the drawings:

fig. 1 is a flowchart of a subtitle extraction method under a complex changing scene of a movie background based on OCR and color preprocessing techniques according to an embodiment of the present invention.

Fig. 2 is a schematic view of the apparatus of the present invention.

Fig. 3 is a diagram showing the effect of the present invention after preprocessing for binarization operation.

The specific embodiment is as follows:

The invention will now be described in further detail with reference to the accompanying drawings. The drawings are simplified schematic representations which merely illustrate the basic structure of the invention and therefore show only the structures which are relevant to the invention.

Example 1

As shown in fig. 1, this embodiment 1 provides a method for extracting movie subtitles in a background complex changing scene based on OCR and color preprocessing, which improves the adaptability of the prior art to the complex changing scene in the movie, thereby helping to improve the extracting accuracy of the movie subtitles.

Specifically, the method comprises:

S110, sampling a film frame to obtain a specific position of a text frame;

Specifically, n picture frames are randomly sampled from a movie, all text frame positions are calculated for the n picture frames through an OCR method, the number of the text frames is S _n, the specific positions of the text frames are represented by { X, Y and H }, X, Y is the center point coordinate of the text frames, and H is the height of the text frames.

S120, acquiring subtitle position information by using a clustering algorithm;

Specifically, the S _n text frame positions X, Y are clustered by Kmeans to obtain the most numerous categories in the categories, the center point coordinates of the categories are used as the center points of the subtitle positions, the modes of the text frame heights H in the categories of the center point coordinates are counted and used as the subtitle positions, and the width of the movie is used as the width of the subtitle positions.

S130, calculating subtitle color features according to the pixel point modes;

Specifically, for the picture frame sampled in S1101, a movie picture is taken by using a caption position, the pixel value of the taken picture is represented by Img _i,j, for each column of pixels in the taken picture, the most numerous pixel values are counted, that is, the mode of the pixel value in each column is represented by C _j, the mode in { C ₁,…,C_j,…,C_w } is calculated and denoted as M, w is the column number of the picture pixels, and for all taken picture frames, the above operation is repeated, and the mode of the pixel value in { M } is counted as the caption color information feature.

And S140, extracting caption text according to caption color preprocessing.

The method comprises the steps of extracting all picture frames in a film, intercepting the picture according to the subtitle position, carrying out binarization operation on the intercepted picture, setting a pixel value equal to the subtitle color characteristic as 255, setting a pixel value not equal to the subtitle color characteristic as 0, and sending the picture subjected to the binarization operation into OCR for subtitle character extraction.

In summary, the method for extracting the subtitle under the complex changing scene of the movie background based on the OCR and color preprocessing technology provided by the invention comprises the steps of counting all text frames in a picture frame through the OCR technology, and carrying out clustering and audience-taking operation on the text frame information to determine the subtitle position. According to the sequence of column statistics, row statistics and film frame statistics, the color characteristics of the subtitles are clarified, the color information is preprocessed by binarization operation, then the subtitle is extracted by OCR technology, and the effect of the preprocessed binarization operation is shown in figure 3. The problem that the existing simple OCR technology cannot identify the complex scene change in the movie subtitle is solved, and therefore the precision of movie subtitle extraction is improved.

Example 2

The present embodiment relates to a device for extracting movie subtitles in a background complex change scene based on OCR and color preprocessing, which includes a memory and one or more processors, as shown in fig. 2, wherein executable codes are stored in the memory, and the one or more processors are configured to implement the method for extracting movie subtitles in a background complex change scene based on OCR and color preprocessing of embodiment 1 when executing the executable codes.

Example 3

The present embodiment relates to a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the movie subtitle extraction method of embodiment 1 under a complex background change scene based on OCR and color preprocessing.

With the above-described preferred embodiments according to the present invention as an illustration, the above-described descriptions can be used by persons skilled in the relevant art to make various changes and modifications without departing from the scope of the technical idea of the present invention. The technical scope of the present invention is not limited to the description, but must be determined according to the scope of claims.

Claims

1. The method for extracting the movie subtitle under the background complex change scene based on OCR and color preprocessing is characterized by comprising the following steps:

S110, sampling a specific position of a text frame obtained by a film frame, randomly sampling a plurality of picture frames from the specific position, and obtaining all the positions of the text frame by an OCR method for the picture frames, wherein the specific position of the text frame obtained by the film frame comprises the following steps:

S1101, randomly sampling n picture frames from a film;

S1102, for n picture frames, calculating all text frame positions by OCR method, and recording the number of text frames as ;

S1103, useThe specific position of the text frame is represented, X, Y is the center point coordinate of the text frame, and H is the height of the text frame;

s130, calculating the subtitle color characteristics according to the pixel mode, wherein the method specifically comprises the following steps:

s1303, calculating the mode in { C ₁,…,C_j ,…,C_w }, and recording as M, wherein w is the number of columns of picture pixels;

s1304, repeating the above operation for all the intercepted picture frames, and counting the pixel value mode in { M } as the subtitle color information feature;

s140, extracting caption text according to caption color preprocessing, wherein the method specifically comprises the following steps:

2. The method for extracting movie subtitles in a complex background changing scene based on OCR and color preprocessing according to claim 1, characterized in that:

the step S120 of obtaining subtitle position information by using a clustering algorithm specifically includes:

S1201 pair of Position of individual text frameKmeans clustering is carried out to obtain the most number of classes in the classes, and the coordinates of the central points are used as the central points of the subtitle positions;

s1202, counting the height of the text frame in the category of the coordinates of the center point to which the text frame belongs And as the height of the subtitle position;

S1203 sets the width of the movie as the width of the subtitle position.

3. The method for extracting movie subtitles in a complex background scene change based on OCR and color preprocessing as claimed in claim 1, wherein in step S130, the subtitle color feature storage format requirement is RGB three-channel mode.

4. A device for extracting movie titles in a background complex-change scene based on OCR and color preprocessing, comprising a memory and one or more processors, wherein the memory stores executable codes, and the one or more processors are configured to implement the method for extracting movie titles in a background complex-change scene based on OCR and color preprocessing according to any one of claims 1 to 3 when executing the executable codes.

5. A computer-readable storage medium, having stored thereon a program which, when executed by a processor, implements the method for extracting movie titles in a complex background changing scene based on OCR and color preprocessing as set forth in any one of claims 1 to 3.