CN101360184A

CN101360184A - System and method for extracting key frame of video

Info

Publication number: CN101360184A
Application number: CN 200810211435
Authority: CN
Inventors: 陈波
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2008-09-22
Filing date: 2008-09-22
Publication date: 2009-02-04
Anticipated expiration: 2028-09-22
Also published as: CN101360184B

Abstract

The invention provides a system for extracting video key frames as well as a method thereof, relating to the video image technique field. The system comprises: a key frame extracting unit which carries out the histogram and grey chart operations to image data of two adjacent frames in video data so as to acquire characteristic vectors composed of the operation results, an Euclidean distance of the characteristic vectors is compared with the preset threshold value, and a lens transform boundary is acquired based on the comparison results and key frames are extracted. The system and the method for extracting video key frames can improve the performance of the video key frames.

Description

System and method for extracting video key frame

Technical Field

The invention relates to the technical field of video images, in particular to a system and a method for extracting video key frames.

Background

With the rapid development of internet technology and imaging display technology, a large amount of multimedia information is transmitted to a network for people to watch, and acquiring multimedia information from the network becomes an indispensable part of people's daily life. The video on demand system, the network video sharing, the video programs and the like continuously improve the appreciation experience of consumers and meet the spiritual culture requirements of people. However, with the tremendous enrichment of multimedia applications, the number of network videos is expanding dramatically, and among many network videos, there are likely many sensitive or unhealthy videos to be uploaded at the same time. Therefore, a significant problem associated with network video is how to effectively audit or analyze the enormous amount of video data to prevent some sensitive or unhealthy content from being distributed over the network.

Since the uploading amount of network videos is very large, the information amount of images and video data is also very large, the contents of the network videos generally need to be observed manually for understanding, and the efficiency is very low if the videos are only audited or analyzed by the manual observation. In order to save time, key frames need to be extracted from the video, and the video needs to be audited or analyzed by displaying or analyzing images of the key frames. The technical scheme adopted by the prior art is as follows: and extracting video key frames according to a fixed time interval, and generating pictures from the key frames. According to the scheme, the key frames are extracted only according to the fixed time interval, so that some key pictures are omitted, if the time interval is shortened, a large amount of labor is wasted when image data of the key frames are audited or analyzed, and therefore the video key frames extracted in the prior art cannot well reflect the general situation of the video, and the performance of extracting the video key frames is low.

Therefore, a new system and method for extracting a video key frame are needed, which can improve the performance of extracting the video key frame.

Disclosure of Invention

An objective of the present invention is to provide a system and a method for extracting a video key frame, which are used to solve the problem of low performance in the prior art.

In order to achieve the purpose of the invention, the system comprises a key frame extraction unit, a histogram and gray-scale map operation is carried out on image data of two adjacent frames in video data to obtain a feature vector consisting of the operation result, the Euclidean distance of the feature vector is compared with the size of a preset threshold value, and a lens conversion boundary is obtained and a key frame is extracted based on the comparison result.

The key frame extraction unit includes:

the operation module is used for performing histogram and gray level image operation on the image data of two adjacent frames, acquiring a feature vector consisting of a frame difference value, a mean difference value and a variance difference value of the gray level image of the two adjacent frames, and performing weighting processing on the frame difference value, the mean difference value and the variance difference value to obtain the Euclidean distance of the feature vector;

and the frame extraction module is connected with the operation module, performs data interaction, compares the Euclidean distance with a preset threshold value, and acquires a lens conversion boundary based on the comparison result to extract a corresponding key frame.

The shot transition boundary includes a shot fade boundary and a shot cut boundary, and the predetermined threshold includes a shot fade threshold and a shot cut threshold.

The frame extraction module is further to: comparing the Euclidean distance with the lens gradual change threshold value and the lens shear threshold value, when the Euclidean distance is larger than the lens gradual change threshold value and smaller than the lens shear threshold value, acquiring a lens gradual change boundary and extracting a corresponding key frame, and when the Euclidean distance is larger than the lens shear threshold value, acquiring a lens shear boundary and extracting a corresponding key frame.

Preferably, the system further comprises:

the decoding unit is used for decoding the video data to obtain decoded image data;

the image processing unit is connected with the decoding unit and the key frame extraction unit and performs data interaction, performs normalization processing on the decoded image data, and transmits the processed image data to the key frame extraction unit;

and the picture synthesis unit is connected with the key frame extraction unit and performs data interaction, and synthesizes the obtained key frames to generate dynamic picture data.

To better achieve the object of the invention, the method comprises:

A. performing histogram and gray-scale map operation on the image data of two adjacent frames to obtain a feature vector consisting of the operation results;

B. comparing the Euclidean distance of the characteristic vector with a preset threshold value, and acquiring a lens conversion boundary according to the comparison result;

C. and extracting the key frame according to the shot conversion boundary.

The feature vector is composed of a frame difference value of two adjacent frames, a mean difference value of a gray level image and a variance difference value of the gray level image, and the Euclidean distance is obtained by weighting the frame difference value, the mean difference value and the variance difference value.

The shot transition boundary comprises a shot fade boundary and a shot cut boundary, the predetermined thresholds comprise a shot fade threshold and a shot cut threshold, and the step B further comprises: comparing the Euclidean distance with the lens gradual change threshold value and the lens shear threshold value, when the Euclidean distance is larger than the lens gradual change threshold value and smaller than the lens shear threshold value, acquiring a lens gradual change boundary, and when the Euclidean distance is larger than the lens shear threshold value, acquiring a lens shear boundary.

The step A is also preceded by: decoding the video data to obtain decoded image data, and performing normalization processing on the decoded image data.

The step C is followed by: and synthesizing the video key frame to generate dynamic picture data and displaying the dynamic picture data.

Therefore, in the process of extracting the video key frames, the difference between the method and the device is that the histogram and the gray-scale map operation is carried out on the image data of two adjacent frames, the shot conversion boundary is obtained according to the operation result, and the corresponding key frames are extracted, so that the extracted key frames can reflect the whole general situation of the video, and the performance of extracting the key frames can be improved; in addition, in the process of video auditing, compared with the prior art, the method has the advantages that the normalization processing is carried out on the decoded image data, so that the operation speed is improved; in addition, the invention generates the extracted key frame into dynamic picture data, thereby facilitating the browsing of users.

Drawings

FIG. 1 is a block diagram of a system for extracting key frames of a video in accordance with one embodiment of the present invention;

FIG. 2 is a block diagram of a system for extracting key frames of a video in accordance with one embodiment of the present invention;

FIG. 3 is a schematic diagram of the internal structure of a key frame extraction unit according to an embodiment of the present invention;

FIG. 4 is a flow chart of a method for extracting key frames of a video according to one embodiment of the present invention;

fig. 5 is a flowchart of a method for extracting key frames from a video according to an embodiment of the present invention.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments.

Detailed Description

In the invention, the histogram and the gray-scale map operation are carried out on the image data of two adjacent frames, and the final operation result is compared with the preset threshold value, so that the lens conversion boundary is obtained and the corresponding key frame is extracted. Thus, the performance of extracting the video key frame is improved.

The system for extracting video key frames comprises a key frame extraction unit 300, which is used for performing histogram and gray map operation on image data of two adjacent frames in video data to obtain a feature vector consisting of the operation result, comparing the Euclidean distance of the feature vector with the size of a preset threshold value, and acquiring a lens conversion boundary and extracting key frames based on the comparison result.

Fig. 1 shows a system structure for extracting a video key frame in an embodiment of the present invention, and the system includes a decoding unit 100, an image processing unit 200, and a key frame extraction unit 300. It should be noted that the connection relationship between the devices in all the figures of the present invention is for clearly explaining the need of information interaction and control process, and therefore, should be regarded as a logical connection relationship, and should not be limited to physical connection only. It should be noted that the communication method between the functional modules may be various, and the scope of the present invention should not be limited to a specific type of communication method. Wherein:

the decoding unit 100 is configured to decode the video data to obtain decoded image data. Since the video data uploaded to the internet is encoded video data in various formats, the video data in various formats needs to be decoded first to obtain original image data.

The image processing unit 200 is connected to the decoding unit 100 and the key frame extracting unit 300, performs data interaction, and is configured to perform normalization processing on the decoded image data and transmit the processed image data to the key frame extracting unit 300. Since the video data uploaded to the internet may have different sizes, normalization processing of the decoded image data is required to improve the calculation speed. In one embodiment, the image data may be normalized to a size of 120 pixels X90 pixels

The key frame extraction unit 300 is connected to the picture browsing terminal 500, performs data interaction, and is configured to perform histogram and grayscale operation on image data of two adjacent frames to obtain a feature vector composed of the operation result, compare the euclidean distance of the feature vector with a predetermined threshold, obtain a lens conversion boundary based on the comparison result, and extract a key frame.

In one embodiment, the shot transition boundaries include a shot fade boundary and a shot cut boundary, and the predetermined thresholds include a shot fade threshold and a shot cut threshold.

Based on the above embodiments, the present invention proposes another embodiment, and fig. 2 shows a system structure for extracting a video key frame in an embodiment of the present invention, which includes a picture synthesis unit 400 and a picture browsing terminal 500 in addition to a decoding unit 100, an image processing unit 200, and a key frame extraction unit 300, wherein:

the picture synthesizing unit 400 is connected to the key frame extracting unit 300 and the picture browsing terminal 500, performs data interaction, and is configured to synthesize the obtained key frames to generate dynamic picture data. In one embodiment, the generated moving picture data is a picture in gif file format.

The picture browsing terminal 500 is connected to the picture synthesizing unit 400, performs data interaction, and is configured to display the dynamic picture data generated by the picture synthesizing unit 400. The obtained dynamic picture data can be audited or analyzed and processed by the user.

It should be noted that the picture browsing terminal 500 may be in the same device as the picture composition unit 400, or may be connected to the picture composition unit 400 via a network. When connected via a network, in order to increase the transmission speed, the moving picture data generated by the picture synthesis unit 400 may be compressed, and the compressed moving picture data is transmitted to the picture viewing terminal 500 via the network, decompressed by the picture viewing terminal 500, and then displayed.

Fig. 3 shows an internal structure of the key frame extracting unit 300 in an embodiment of the present invention. The key frame extraction unit 300 includes an operation module 301, a buffer module 302, a timing module 303, and a frame extraction module 304, wherein:

(1) the operation module 301 is connected to the buffer module 302, the timing module 303 and the frame extraction module 304, performs data interaction, and is configured to perform histogram and gray-scale map operation on image data of two adjacent frames, obtain a feature vector including a frame difference value of the two adjacent frames, a mean difference value of the gray-scale map, and a variance difference value of the gray-scale map, and perform weighting processing on the frame difference value, the mean difference value, and the variance difference value to obtain a euclidean distance of the feature vector.

In one embodiment, the formula of the feature vector obtained by the operation module 301 is:

d (k, k +1) ═ Z (k, k +1), M (k, k +1), S (k, k +1)), where:

the formula for Z (k, k +1) is:

where Z (k, k +1) is a frame difference value between the k-th frame and the k + 1-th frame, M is the number of pixels per frame, N is the number of colors, h_k(i) Is the value of the histogram with color i in the k-th frame, h_k+1(i) Is the histogram value with color i for the k +1 th frame.

The formula for M (k, k +1) is:

<math> <mrow> <mi>M</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mover> <mi>μ</mi> <mo>&OverBar;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>μ</mi> <mo>&OverBar;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> </math>

wherein M (k, k +1) is the mean difference value of the grayscale images of the k-th frame and the k + 1-th frame,

is the average of the (k +1) th frame gray scale map,is the average of the k-th frame gray scale map.

The calculation formula of S (k, k +1) is:

<math> <mrow> <mi>S</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>,</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mover> <msup> <mi>σ</mi> <mn>2</mn> </msup> <mo>&OverBar;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mover> <msup> <mi>σ</mi> <mn>2</mn> </msup> <mo>&OverBar;</mo> </mover> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </mrow> </math>

wherein S (k, k +1) is the difference of the variances of the gray level images of the k-th frame and the k + 1-th frame,

is the variance of the grayscale image of the (k +1) th frame,

is the variance of the grayscale image of the k-th frame.

In an exemplary embodiment, the calculation formula of the mean value of a certain frame gray scale map is:

<math> <mrow> <mover> <mi>μ</mi> <mo>&OverBar;</mo> </mover> <mo>=</mo> <mfrac> <mn>1</mn> <mi>hw</mi> </mfrac> <munderover> <mi>Σ</mi> <mrow> <mi>y</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>h</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <munderover> <mi>Σ</mi> <mrow> <mi>x</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>w</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>I</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> </mrow> </math>

wherein,

is the mean of the gray level image, h is the height of the image, w is the width of the image, and I (x, y) is the gray level of the pixel.

The calculation formula of the variance of a certain frame gray level image is as follows:

<math> <mrow> <mover> <msup> <mi>σ</mi> <mn>2</mn> </msup> <mo>&OverBar;</mo> </mover> <mo>=</mo> <mfrac> <mn>1</mn> <mi>hw</mi> </mfrac> <munderover> <mi>Σ</mi> <mrow> <mi>y</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>h</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <munderover> <mi>Σ</mi> <mrow> <mi>x</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>w</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msup> <mrow> <mo>(</mo> <mi>I</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>μ</mi> <mo>&OverBar;</mo> </mover> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </math>

wherein,is the variance of the gray scale map, h is the height of the image, w is the width of the image, I (x, y) is the gray scale value of the pixel,is the average of the gray scale map for that frame.

After the operation module 301 obtains the feature vector, the frame difference value, the mean difference value, and the variance difference value are weighted to obtain the euclidean distance of the feature vector, in this embodiment, the calculation formula of the euclidean distance is:

where d is the Euclidean distance of the feature vector, Z (k-1, k) is the frame difference value between the k-1 th frame and the k-1 th frame, Z (k, k +1) is the frame difference value between the k-1 th frame and the k +1 th frame, M (k-1, k) is the mean difference value of the grayscale images of the k-1 th frame and the k-1 th frame, M (k, k +1) is the mean difference value of the grayscale images of the k-1 th frame and the k +1 th frame, S (k-1, k) is the variance difference value of the grayscale images of the k-1 th frame and the k-1 th frame, S (k, k +1) is the variance difference value of the grayscale images of the k-1 th frame and the k +1 th frame, and alpha, beta and delta are weighting coefficients. The values of α, β, and δ may vary according to different application environments, and are preferably 1.7, 1, and 1.2, respectively.

(2) The buffer module 302 is connected to the operation module 301, the timing module 303 and the frame extraction module 304 for performing data interaction, and is configured to store image data features obtained by calculation of the operation module 301, such as the histogram and the grayscale map values obtained by the calculation, and the buffer module 302 may be configured to store only image data features of a segment of continuous frames. In a preferred embodiment, the buffer module 302 is configured to store image data characteristics of 50 consecutive frames, and when the space of the buffer module 302 is full, the data is replaced with image data characteristics of the next 50 frames. When the operation module 301 performs further operation, the image data features stored in the cache module 302 can be directly extracted without recalculating the original image data, thereby increasing the calculation speed.

(3) And the timing module 303 is connected with the operation module 301, the buffer module 302 and the frame extraction module 304, performs data interaction, and is used for timing. For some videos in which shot gradual change and shot shear are not likely to occur in some scenes, occasional missed detection may be caused, and when the timing module 303 times out a certain time and does not output a key frame, the frame extraction module 304 may forcibly output a frame to prevent the missed detection phenomenon, so as to further improve the performance of extracting the video key frame according to the present invention.

(4) And the frame extraction module 304 is connected with the operation module 301, the cache module 302 and the timing module 303, performs data interaction, and is configured to compare the european style distance with the shot gradual change threshold and the shot shear threshold, acquire the shot gradual change boundary and extract a corresponding key frame when the european style distance is greater than the shot gradual change threshold and less than the shot shear threshold, and acquire the shot shear boundary and extract a corresponding key frame when the european style distance is greater than the shot shear threshold.

In an exemplary embodiment, d is the calculated euclidean distance, T1 is set as a shot gradual change threshold, Th is a shot shear threshold, T1 is preferably 0.1, and Th is preferably 0.3. The operation module 301 extracts image data features of two adjacent frames from the buffer module 302, and calculates to obtain the euclidean distance d. When d is less than T1, the shot is not gradual changed and sheared, when T1 is less than d and less than Th, the shot is gradual changed, and the shot gradual change boundary can be obtained before d does not reach Th, and when d is more than Th, the shot is sheared, and the shot shearing boundary can be obtained during the process of shot shearing. In one embodiment, during the process of the shot fading, the key frames can be extracted at the time point of the beginning and the end of the fading, and the middle point of the fading, and when the shot is sheared, the key frames can be extracted at the boundary of the beginning shearing. In this way, the extracted key frames can reflect the profile of the entire video.

Fig. 4 shows a flow of a method for extracting a video key frame in an embodiment of the present invention, where the flow of the method is based on the system structure shown in fig. 1, and the specific process is as follows:

in step S401, the key frame extraction unit 300 performs histogram and grayscale map operations on the image data of two adjacent frames to obtain a feature vector composed of the operation results.

In step S402, the key frame extraction unit 300 compares the euclidean distance of the feature vector with the magnitude of a predetermined threshold value, and acquires a shot transition boundary based on the comparison result.

In step S403, the key frame extraction unit 300 extracts the key frame according to the shot transition boundary.

In one embodiment, the feature vector obtained by the key frame extracting unit 300 includes a frame difference value, a mean difference value of a gray level image, and a variance difference value of the gray level image of two adjacent frames, which are obtained by performing a weighting process on the frame difference value, the mean difference value, and the variance difference value.

Fig. 5 shows a flow of a method for extracting a video key frame in an embodiment of the present invention, where the flow of the method is based on the system structure shown in fig. 2, and the specific process is as follows:

in step S501, the decoding unit 100 decodes the video data to obtain decoded image data.

In step S502, the image processing unit 200 performs normalization processing on the decoded image data. In one embodiment, the image processing unit 200 normalizes the decoded image data into image data of a size of 120 pixels X90 pixels.

In step S503, the key frame extraction unit 300 performs histogram and grayscale map operations on the image data of two adjacent frames to obtain a feature vector composed of the operation results.

In one embodiment, the specific process of step S503 is:

the operation module 301 performs histogram and gray level map operation on two adjacent frames of image data to obtain a frame difference value, a mean difference value of the gray level map, and a variance difference value of the gray level map of the two adjacent frames, and obtain a feature vector including the frame difference value, the mean difference value, and the variance difference value.

d (k, k +1) ═ Z (k, k +1), M (k, k +1), S (k, k +1)), where:

the formula for Z (k, k +1) is:

The formula for M (k, k +1) is:

wherein M (k, k +1) is the mean difference value of the grayscale images of the k-th frame and the k + 1-th frame,is the average of the (k +1) th frame gray scale map,

is the average of the k-th frame gray scale map.

The calculation formula of S (k, k +1) is:

is the variance of the grayscale image of the (k +1) th frame,

is the variance of the grayscale image of the k-th frame.

wherein,

wherein,

is a grey scale mapVariance, h is the height of the image, w is the width of the image, I (x, y) is the gray value of the pixel,

is the average of the frame gray map.

In step S504, the key frame acquisition unit 300 compares the euclidean distance of the feature vector with the magnitude of a predetermined threshold, and acquires a shot transition boundary and extracts a corresponding key frame according to the comparison result.

In one embodiment, the specific process of step S504 is:

after the operation module 301 obtains the feature vector, the weighted processing is performed on the frame difference value, the mean difference value, and the variance difference value obtained by the operation to obtain the euclidean distance of the feature vector, in this embodiment, the calculation formula of the euclidean distance is:

wherein d is the Euclidean distance of the feature vector, Z (k-1, k) is the frame difference value between the k-1 th frame and the k-1 th frame, Z (k, k +1) is the frame difference value between the k-1 th frame and the k +1 th frame, M (k-1, k) is the mean difference value of the gray level images of the k-1 th frame and the k-1 th frame, M (k, k +1) is the mean difference value of the gray level images of the k-1 th frame and the k +1 th frame, S (k-1, k) is the variance difference value of the gray level images of the k-1 th frame and the k-1 th frame, and S (k, k +1) is the variance difference value of the gray level images of the k-1 th frame and the k +1 th frame. α, β, δ are weighting coefficients. The values of α, β, and δ may vary according to different application environments, and are preferably 1.7, 1, and 1.2, respectively.

After obtaining the euclidean distance, the frame extraction module 304 compares the euclidean distance with a predetermined threshold. In one embodiment, the predetermined thresholds include a shot fade threshold and a shot cut threshold, and when the euclidean distance is greater than the shot fade threshold and less than the shot cut threshold, the shot fade boundary is acquired and the corresponding key frame is extracted, and when the euclidean distance is greater than the shot cut threshold, the shot cut boundary is acquired and the corresponding key frame is extracted.

In an exemplary embodiment, d is the calculated euclidean distance, T1 is set as a shot gradual change threshold, Th is a shot shear threshold, T1 is preferably 0.1, and Th is preferably 0.3. The operation module 301 extracts image data features of two adjacent frames from the buffer module 302, and calculates to obtain the euclidean distance d. When d < T1, the shot does not fade and shear, when T1 < d < Th, the shot is fading, and before d does not reach Th, the shot fade boundary can be obtained, when d > Th, the shot has sheared, and during the occurrence of shot shearing, the shot shear boundary is obtained. In one embodiment, during the process of the shot fading, the key frames can be extracted at the time point of the beginning and the end of the fading, and the middle point of the fading, and when the shot is sheared, the key frames can be extracted at the boundary of the beginning shearing.

In a preferred embodiment, the buffer module 302 is configured to store image data features of consecutive frames, for example, 50 consecutive frames of image data features are set to be stored, and when the space of the buffer module 302 is full, the image data features of the next 50 frames are replaced. The operation module 301 and the frame extraction module 304 can directly extract the image data features stored in the buffer module 302 for further operation or judgment.

In another preferred embodiment, the timing module 304 is provided for timing. In some scenes, videos in which shots are not easy to fade or cut may be occasionally missed, and when the timing module 303 times out a certain time and does not output a key frame, the frame extraction module 304 forcibly outputs a frame to prevent the missed detection, so as to further improve the video auditing performance of the invention.

In step S505, the picture synthesis unit 400 synthesizes the extracted key frames, and generates moving picture data. In one embodiment, the generated moving picture data is a picture in gif file format.

In step S506, the picture browsing terminal 500 displays the moving picture data. In an embodiment, the picture browsing terminal 500 is connected to the picture synthesizing unit 400 via a network, the dynamic picture data generated by the picture synthesizing unit 400 may be compressed and then transmitted to the picture browsing terminal 500, and then decompressed by the picture browsing terminal 500 for display, and a user at the picture browsing terminal 500 may view the dynamic picture data to obtain an overview of a video, so as to perform an audit or analysis process on the extracted image data of the key frame.

It should be noted that the above embodiments are only used for explaining the technical solution of the present invention, and the present invention is typically applied to, but not limited to, a video audit analysis system, and the method set forth in the present invention can also be applied to other similar video processing systems.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A system for extracting key frames from a video, the system comprising:

and the key frame extraction unit is used for performing histogram and gray-scale map operation on the image data of two adjacent frames in the video data to obtain a feature vector consisting of the operation result, comparing the Euclidean distance of the feature vector with the size of a preset threshold value, acquiring a lens conversion boundary based on the comparison result and extracting a key frame.

2. The system for extracting video key frames according to claim 1, wherein said key frame extracting unit comprises:

3. The system for extracting video key frames according to claim 1 or 2, wherein the shot transition boundaries comprise shot fade boundaries and shot shear boundaries, and the predetermined thresholds comprise a shot fade threshold and a shot shear threshold.

4. The system for extracting video key frames according to claim 3, wherein said frame extracting module is further configured to: comparing the Euclidean distance with the lens gradual change threshold value and the lens shear threshold value, when the Euclidean distance is larger than the lens gradual change threshold value and smaller than the lens shear threshold value, acquiring a lens gradual change boundary and extracting a corresponding key frame, and when the Euclidean distance is larger than the lens shear threshold value, acquiring a lens shear boundary and extracting a corresponding key frame.

5. The system for extracting video key frames according to claim 1, further comprising:

6. A method for extracting key frames from a video, the method comprising the steps of:

C. and extracting the key frame according to the shot conversion boundary.

7. The method of claim 6, wherein the feature vector is composed of a frame difference value of two adjacent frames, a mean difference value of a gray-scale image, and a variance difference value of the gray-scale image, and the Euclidean distance is obtained by weighting the frame difference value, the mean difference value, and the variance difference value.

8. The method of extracting video key frames according to claim 6 or 7, wherein the shot transition boundary comprises a shot fade boundary and a shot shear boundary, the predetermined threshold comprises a shot fade threshold and a shot shear threshold, the step B further comprises: comparing the Euclidean distance with the lens gradual change threshold value and the lens shear threshold value, when the Euclidean distance is larger than the lens gradual change threshold value and smaller than the lens shear threshold value, acquiring a lens gradual change boundary, and when the Euclidean distance is larger than the lens shear threshold value, acquiring a lens shear boundary.

9. The method for extracting video key frames according to claim 6, wherein said step A is preceded by the steps of: decoding the video data to obtain decoded image data, and performing normalization processing on the decoded image data.

10. The method for extracting video key frames according to claim 6, further comprising after said step C: and synthesizing the video key frame to generate dynamic picture data and displaying the dynamic picture data.