KR20050041761A

KR20050041761A - Method of detecting shot change frames

Info

Publication number: KR20050041761A
Application number: KR1020030077029A
Authority: KR
Inventors: 나원; 백중환; 홍승범
Original assignee: 학교법인 정석학원
Priority date: 2003-10-31
Filing date: 2003-10-31
Publication date: 2005-05-04

Abstract

본 발명은 영상 처리와 관련된 것으로, 특히 샷 전환 프레임을 검출하는 방법에 관한 것으로서, 학습 동영상에서 다수의 샷 전환 프레임 검출파라미터에 따른 인접한 두 프레임간의 특성 변화량을 획득하는 단계와, 샷 전환 프레임 검출파라미터별로 샷 전환 프레임들의 수가 최대가 되도록 하는 임계치를 설정하는 단계와, 다수의 샷 전환 프레임 검출파라미터와 검출 파라미터별로 설정된 임계치를 이용하여 이진 분류 트리 생성하는 단계와, 생성된 이진 분류 트리를 이용하여 해당 동영상에서 샷 전환 프레임 검출하는 단계를 포함하는 것을 특징으로 한다.The present invention relates to image processing, and more particularly, to a method of detecting a shot change frame, the method comprising: acquiring a characteristic change amount between two adjacent frames according to a plurality of shot change frame detection parameters in a training video; Setting a threshold for maximizing the number of shot change frames each time, generating a binary classification tree using a plurality of shot change frame detection parameters and thresholds set for each detection parameter, and using the generated binary classification tree. And detecting a shot change frame in the video.

Description

Method of detecting shot change frames}

본 발명은 영상 처리기술에 관련된 것으로, 좀더 상세하게는 동영상에서 샷이 전환되는 프레임을 검출하는 샷 전환 프레임 검출방법에 관한 것이다.The present invention relates to an image processing technique, and more particularly, to a shot switching frame detection method for detecting a frame in which a shot is switched in a video.

얼마 전까지만 해도 영화, 개인용 비디오를 포함하는 동영상은 주로 VTR, TV를 통해서 감상하였다. 그러나 전기전자기술이 발달하면서 DVR, PVR, 컴퓨터 뿐 만 아니라 점차 휴대폰을 통해서 동영상을 감상할 수 있게 되었다. 이에 따라 최근에는 사용자가 동영상을 편리하게 검색 및 감상할 수 있도록 쉽고 빠르게 원하는 화면정보를 검출할 수 있는 검출기술들이 급증하고 있다. Until recently, movies including movies and personal videos were mainly watched through VTRs and TVs. However, with the development of electrical and electronic technology, not only DVRs, PVRs, computers, but also mobile phones can be viewed. Accordingly, recently, detection technologies for quickly and easily detecting desired screen information are increasing so that a user can conveniently search and enjoy a video.

현재 제안된 검출기술 중에는 전체 동영상에서 중요한 키 프레임(key frame)만을 검출하는 기술이 있다. 일반적으로 프레임은 전체 동영상을 이루는 기본 화면정보를, 샷은 하나의 카메라 동작이 끝나고 다른 카메라로 넘어가기까지의 프레임들의 집합을 의미한다. 보통 하나의 샷 내부의 인접한 두 프레임간에는 그 내용이나 픽셀의 밝기, 히스토그램 특성에 있어 높은 유사성을 가지는 반면, 샷이 전환되는 부분의 인접한 두 프레임은 화면을 이루는 내용도 다르고, 픽셀의 밝기, 히스토그램 또한 다르다. 기존에 제안된 키 프레임 검출기술은 인접한 두 프레임 간의 픽셀 밝기 차이나 히스토그램 특성 변화와 같은 단일 특성을 선택적으로 이용하여 샷이 전환되는 프레임을 검출하여 이를 표시해줌으로써 사용자에게 동영상의 전체 내용을 빨리 알 수 있도록 하는 기술이다. Among the currently proposed detection techniques, there is a technique of detecting only key frames in the entire video. In general, a frame refers to basic screen information constituting an entire video, and a shot refers to a set of frames from one camera operation to another camera. Usually, two adjacent frames within a shot have high similarity in the content, pixel brightness, and histogram characteristics, whereas the two adjacent frames in which the shot is switched have different contents, and the brightness, histogram, different. The proposed key frame detection technology detects and displays the frame in which the shot is converted by using a single characteristic such as pixel brightness difference or histogram characteristic change between two adjacent frames so that the user can know the entire contents of the video quickly. It is a technique to do.

여기서, 인접한 두 프레임간의 픽셀 밝기 비교방식은 배경이 고정되어 있고, 카메라가 거의 움직이지 않거나 느리게 움직일 때 얻어지는 동영상에 적용할 경우 검출 성능이 현저히 높다. 한편, 인접한 두 프레임간의 히스토그램 비교방식은 객체 혹은 카메라가 빨리 움직이거나 회전을 할 때 얻어지는 동영상에 적용할 경우 검출 성능이 현저히 높다. 그러나 인접한 두 프레임 간의 픽셀 밝기 비교방식은 객체 혹은 카메라가 빨리 움직이거나 회전을 할 때 얻어지는 동영상에 적용할 경우에, 인접한 두 프레임 간의 히스토그램 비교방식은 배경이 고정되어 있고 카메라가 거의 움직이지 않거나 느리게 움직일 때 얻어지는 동영상에 적용할 경우에 샷 전환 프레임검출 성능이 현저히 저하된다. Here, the pixel brightness comparison method between two adjacent frames has a fixed background, and the detection performance is remarkably high when applied to a moving picture obtained when the camera moves little or moves slowly. On the other hand, the histogram comparison method between two adjacent frames has a high detection performance when applied to a moving image obtained when an object or a camera moves or rotates quickly. However, if the pixel brightness comparison between two adjacent frames is applied to an object or a movie obtained when the camera moves or rotates quickly, the histogram comparison between two adjacent frames may have a fixed background and the camera may move or move slowly. When applied to a video obtained at the time of shot conversion frame detection performance is significantly reduced.

이에 종래에는 기존에 제안된 키 프레임 검출기술들을 사용하여 다양한 카메라 촬영 및 편집방법에 의해 제작된 동영상에서 샷이 전환되는 프레임을 검출할 경우 그 검출성능에 대해 사용자가 신뢰할 수 없다는 문제점이 발생하게 되었다.Therefore, conventionally, when using the proposed key frame detection techniques to detect a frame in which the shot is converted in the video produced by a variety of camera shooting and editing methods, a problem that the user is unreliable in the detection performance has occurred .

본 발명은 상기와 같은 배경에서 제안된 것으로서, 기존에 제안된 키 프레임 검출기술들을 사용하여 다양한 카메라 촬영 및 편집방법에 의해 제작된 동영상에서 샷 전환 프레임을 검출할 경우 그 검출성능에 대해 사용자가 신뢰할 수 있는 샷 전환 프레임 검출방법을 제공함에 그 목적이 있다. The present invention has been proposed in the background as described above, and when a shot change frame is detected in a video produced by various camera shooting and editing methods using the proposed key frame detection techniques, the user has no confidence in the detection performance. It is an object of the present invention to provide a method for detecting a shot switchover frame.

상기와 같은 목적을 달성하기 위한 본 발명의 일 양상에 따른 샷 전환 프레임 검출방법은, 학습 동영상에서 다수의 샷 전환 프레임 검출파라미터에 따른 인접한 두 프레임간의 특성 변화량을 획득하는 단계와, 샷 전환 프레임 검출파라미터별로 샷 전환 프레임들의 수가 최대가 되도록 하는 임계치를 설정받는 단계와, 다수의 샷 전환 프레임 검출파라미터와 검출파라미터별로 설정된 임계치를 이용하여 이진 분류 트리 생성하는 단계와, 생성된 이진 분류 트리를 이용하여 해당 동영상에서 샷 전환 프레임 검출하는 단계를 포함하는 것을 특징으로 한다.According to an aspect of the present invention, there is provided a method of detecting a shot change frame according to an aspect of the present invention, including: acquiring a characteristic change amount between two adjacent frames according to a plurality of shot change frame detection parameters in a training video; Receiving a threshold for maximizing the number of shot change frames per parameter, generating a binary classification tree using a plurality of shot change frame detection parameters and thresholds set for each detection parameter, and using the generated binary classification tree And detecting a shot change frame in the corresponding video.

이하, 첨부된 도면을 참조하여 전술한, 그리고 추가적인 양상을 기술되는 바람직한 실시예를 통하여 본 발명을 당업자가 용이하게 이해하고 재현할 수 있도록 상세히 설명하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily understand and reproduce the present invention.

도 1 은 본 발명에 따른 동영상에서 샷 전환 프레임을 검출하는 과정을 도시하며, 도 2 는 본 발명에 따른 동영상에서 샷 전환 프레임 검출파라미터별로 검출되지 못한 샷 전환 프레임과 샷 전환으로 잘못 판정된 프레임의 예를 도시한 그래프이고, 도 3 은 본 발명의 일시시예에 따른 이진 분류 트리를 도시한 것이다. 이하, 도1 내지 도3을 참조하여 설명하기로 한다. FIG. 1 illustrates a process of detecting a shot change frame in a video according to the present invention, and FIG. 2 illustrates a shot change frame that is not detected for each shot change frame detection parameter and a frame incorrectly determined as shot change in a video according to the present invention. 3 is a graph illustrating an example, and FIG. 3 illustrates a binary classification tree according to a temporary embodiment of the present invention. Hereinafter, a description will be given with reference to FIGS. 1 to 3.

도1을 참조하면, 먼저 임의의 동영상에서 샷 전환 프레임을 검출하기 위해서는 학습 동영상에서 다수의 샷 전환 프레임 검출파라미터에 따른 인접한 두 프레임간의 특성 변화량을 획득하여야 한다(S10). 본 발명의 바람직한 실시예에 있어서, 상기 다수의 샷 전환 프레임 검출파라미터는 인접한 두 프레임간의 전체 영역 픽셀 밝기 차이에 관한 특징변수(LPD)와, 전체 영역 픽셀 밝기 히스토그램 차이에 관한 특징변수(IHD)와, 국부 영역 픽셀 밝기 히스토그램 차이에 관한 특징변수(LHL)와, 전체 영역 색상 히스토그램 차이에 관한 특징변수(HHD)와, 밝기 성분을 이용한 전체 영역 블록의 유사성에 관한 특징변수(LLR)를 포함한다. Referring to FIG. 1, first, in order to detect a shot change frame in an arbitrary video, an amount of change in characteristics between two adjacent frames according to a plurality of shot change frame detection parameters in a training video must be obtained (S10). In a preferred embodiment of the present invention, the plurality of shot switching frame detection parameters are characterized by a feature variable (LPD) relating to the difference in the overall area pixel brightness between two adjacent frames, and a feature variable (IHD) relating to the difference in the whole area pixel brightness histogram. And a feature variable LHL for local region pixel brightness histogram differences, a feature variable HHD for full region color histogram differences, and a feature variable LLR for similarity of entire region blocks using brightness components.

상기 전체 영역 픽셀 밝기 차이에 관한 특징변수(LPD)는 배경이 고정되어 있고, 카메라가 거의 움직이지 않거나 느리게 움직일 때 얻어지는 동영상에 적용할 경우 검출 성능이 뛰어나며, 하기 수학식 1을 근거하여 산출한다.The feature variable (LPD) related to the full-area pixel brightness difference is excellent in detection performance when applied to a moving image obtained when the background is fixed and the camera is almost unmoving or moving slowly, and is calculated based on Equation 1 below.

여기서, N과 M은 영상의 가로와 세로의 크기를 나타내고, 은 m번째 프레임의 점 에서의 밝기 값이다.Here, N and M represent the horizontal and vertical size of the image, Is the brightness value at the point of the mth frame.

전체 영역 픽셀 밝기 히스토그램 차이에 관한 특징변수(IHD)와, 국부 영역 픽셀 밝기 히스토그램 차이에 관한 특징변수(LHL)와, 전체 영역 색상 히스토그램 차이에 관한 특징변수(HHD)는 각 프레임에서 픽셀의 밝기 혹은 컬러의 분포를 구한 후 인접한 두 개의 히스토그램 분포를 비교하는 방식이다. 이 방식은 카메라의 회전(rotation), 이동(tilt, pan), 그리고 객체의 빠른 이동에 안정된 성능을 나타낸다. 그러나 상기 히스토그램 기반 기법은 프레임의 밝기/컬러 분포만을 이용하게 되므로 위치 정보가 사라지는 단점을 가지고 있다. 따라서 두 프레임 사이에 히스토그램 분포가 비슷한 경우 샷을 놓치는 경우가 발생한다. 이와 같은 문제점을 해결하기 위해 국부 히스토그램 방식을 사용하며, 이 방식은 전체 프레임을 소정 개수의 균등한 블록들로 나누고, 각 블록에 대응하는 인접한 프레임의 블록 사이에서 히스토그램 분포를 비교하여 유사성을 구하게 된다. 따라서 히스토그램 방식은 밝기/컬러의 분포가 변화하는 컷의 경우에 뛰어난 성능을 발휘한다. 전체 영역 픽셀 밝기 히스토그램 차이에 관한 특징변수(IHD)는 하기 수학식 2를 근거하여 산출하며, 국부 영역 픽셀 밝기 히스토그램 차이에 관한 특징변수(LHL)는 하기 수학식 3을 근거하여 산출한다.The feature variable (IHD) for the full-area pixel brightness histogram difference, the feature variable (LHL) for the local-area pixel brightness histogram difference, and the feature variable (HHD) for the full-area color histogram difference are the brightness or The color distribution is obtained and then the two adjacent histogram distributions are compared. This method shows stable performance for camera rotation, tilt and pan, and fast movement of objects. However, since the histogram-based technique uses only the brightness / color distribution of the frame, location information disappears. Therefore, if the histogram distribution is similar between two frames, the shot may be missed. In order to solve this problem, a local histogram method is used, which divides an entire frame into a predetermined number of equal blocks, and compares the histogram distribution between blocks of adjacent frames corresponding to each block to obtain similarity. . Therefore, the histogram method shows excellent performance in the case of cuts in which the brightness / color distribution changes. The feature variable IHD related to the full area pixel brightness histogram difference is calculated based on Equation 2 below, and the feature variable LHL related to the local area pixel brightness histogram difference is calculated based on Equation 3 below.

여기서, 는 m번째 프레임의 히스토그램을 나타내고, B는 최대 밝기 값이다.here, Represents the histogram of the m-th frame, B is the maximum brightness value.

여기서, |BS|는 블록내의 픽셀 수이다. Where | BS | is the number of pixels in the block.

본 실시예에 있어서, 상기 국부 영역 픽셀 밝기 히스토그램 차이에 관한 특징변수(LHL)는 영상 전체를 16개의 블록들로 균등하게 분할하여, 블록들간의 유사도를 측정한다. 이 측정된 유사도들 중, 유사도가 낮은 8개의 블록들을 사용하여 장면 전환점을 검출한다. 여기서, 제외된 상위 블록 8개는 부분적으로 발생하는 객체의 움직임을 제거하기 위함이다.In the present embodiment, the feature variable LHL related to the local area pixel brightness histogram difference is equally divided into 16 blocks to measure the similarity between blocks. Of these measured similarities, eight blocks with low similarity are used to detect the scene turning point. Here, the eight excluded upper blocks are for removing the movement of the partially occurring object.

밝기 성분을 이용한 전체 영역 블록의 유사성에 관한 특징변수(LLR)은 인접한 프레임들을 소정 개의 블록으로 나눈 후 인접한 블록들 사이의 유사도 함수(likelihood ratio)를 이용하여 유사도를 측정하는 블록 기반 방식이다. 이 방식은 블록 단위로 적용하기 때문에 움직임에 덜 민감하게 반응한다. 또한 임의의 모양을 가진 객체가 변화하는 컷에 뛰어난 성능을 발휘한다. 구현 방식은 한 프레임을 동일한 블록 단위들로 중복이 되지 않도록 나눈다. 그리고 각 블록별로 평균과 분산을 구하고, 인접한 프레임에 대응하는 블록의 주변 블록들과 유사도를 검사하여 가장 높은 유사도를 갖는 블록의 유사도 함수를 얻는다. 이 때 얻어진 유사도 함수의 결과가 일정한 임계 값보다 크게 되면 가중치를 1로 설정하고, 그 이하가 되면 가중치를 0으로 설정하여 전체 프레임에 대한 유사도를 검사한다. 밝기 성분을 이용한 전체 영역 블록의 유사성에 관한 특징변수(LLR)은 하기 수학식 4를 근거하여 산출한다.The feature variable (LLR) related to the similarity of the whole area block using the brightness component is a block-based method of measuring similarity by dividing adjacent frames into predetermined blocks and using a similarity ratio between adjacent blocks. Since this method is applied in units of blocks, it is less sensitive to movement. In addition, an object of arbitrary shape is excellent for changing cuts. The implementation divides a frame so that it is not duplicated in the same block units. The average and the variance are calculated for each block, and similarity with neighboring blocks of a block corresponding to an adjacent frame is examined to obtain a similarity function of the block having the highest similarity. At this time, if the result of the similarity function is greater than a predetermined threshold value, the weight is set to 1, and if it is less than that, the weight is set to 0 to check the similarity for the entire frame. The feature variable LLR related to the similarity of the entire area block using the brightness component is calculated based on Equation 4 below.

여기서, 과 는 i번째 블록의 평균과 분산을 의미한다.here, and Denotes the mean and variance of the i th block

전술한 다섯 가지의 샷 전환 프레임 검출파라미터별로 학습 동영상에서 다수의 인접한 두 프레임간의 특징 변화량을 획득한 후 각 검출파라미터별로 임계치를 설정하여야 한다(S12). 본 실시예에서, 임계치는 학습 동영상을 이용하여 실험을 통해 설정하는데, 사용한 학습 동영상은 영화 메트릭스(The Matrix)와 미션 임파서블(The Mission Impossible)이다. 각 학습 동영상은 초당 15프레임이고 영상 크기는 320 ×240이다. 두 학습 동영상 내에는 다양한 편집 기술이 포함되지 않고, 카메라 이동(pan, tilt, zoom)과 컷에 의한 편집만으로 이루어져 있다. 본 실시예에서 두 학습 동영상의 전체 프레임과 발생한 샷 전환점 수는 직접 확인을 통해 얻어냈으며 그 결과는 하기 표1과 같다. 여기서, SC(Shot Change)는 샷 전환점을 나타내고, NSC(No Shot Change)는 샷 전환점이 아닌 프레임을 나타낸다.After acquiring a feature change amount between two adjacent frames in the training video for each of the five shot change frame detection parameters described above, a threshold value for each detection parameter should be set (S12). In the present embodiment, the threshold is set through the experiment using the training video, the training video used is the Matrix (The Matrix) and the Mission Impossible (The Mission Impossible). Each training video is 15 frames per second and the video size is 320 × 240. The two learning videos do not include various editing techniques, but consist only of camera movement (pan, tilt, zoom) and editing by cut. In the present embodiment, the total frames of the two learning videos and the number of shot turning points generated are obtained through direct confirmation, and the results are shown in Table 1 below. Here, SC (Shot Change) represents a shot change point, and NSC (No Shot Change) represents a frame that is not a shot change point.

시퀀스 종류Sequence type 전체 프레임 수Total number of frames SCSC NSCNSC 매트릭스matrix 21672167 4848 21192119 미션 임파서블mission Impossible 24292429 3030 23992399

이후, 본 실시예에서 각 샷 전환 프레임 검출파라미터별로 이진 분류 시 샷 전환이 아닌 프레임의 수가 최대가 되도록 임계치를 설정하는 것이 바람직하다. 본 실시예에서는 도2a 내지 도2e에서 도시된 바와 같이, 샷 전환 프레임 검출파라미터별로 LPD의 임계치는 0.051, HHD의 임계치는 0.154, IHD의 임계치는 0.162, LLR의 임계치는 2.2, LHL의 임계치는 0.17로 설정된다. 본 실시예에서는 각 샷 전환 프레임 검출파라미터별 검출성능 평가를 위해 사용된 recall지수, precision 지수 및 평가 지수(EI: Evaluation Index)를 하기 수학식 5와 같이 정의한다.Subsequently, in the present embodiment, it is preferable to set a threshold value so that the number of frames other than the shot change is maximized when the binary classification is performed for each shot change frame detection parameter. 2A to 2E, the LPD threshold is 0.051, the HHD threshold is 0.154, the IHD threshold is 0.162, the LLR threshold is 2.2, and the LHL threshold is 0.17, as shown in FIGS. 2A to 2E. Is set to. In the present embodiment, a recall index, a precision index, and an evaluation index (EI) used for evaluating detection performance of each shot switching frame detection parameter are defined as shown in Equation 5 below.

precision= precision =

EI=(recall+precision)/2EI = (recall + precision) / 2

여기서, S_C는 학습 동영상에 존재하는 전체 샷 전환점의 수이다. S_M은 각 샷 전환 프레임 검출파라미터에서 검출하지 못한(MD: Missed Detection) 샷 전환점의 수이고, S_F는 샷 전환점으로 잘못 판정된(FA: False Alarm) 프레임 수이다. 그리고 EI는 1에 근사할수록 성능이 우수하다.Here, S _C is the total number of shot turning points present in the training video. S _M is the number of shot change points that are not detected (MD: Missed Detection) in each shot change frame detection parameter, and S _F is the number of frames that are falsely determined as a shot change point (FA). The closer to EI, the better the performance.

매트릭스matrix LPDLPD IHDIHD LHLLHL HHDHHD LLRLLR S_C S _C S_M S _M S_F S _F S_M S _M S_F S _F S_M S _M S_F S _F S_M S _M S_F S _F S_M S _M S_F S _F 4848 44 4848 33 3939 33 3434 33 2525 33 3737 re./pr.re./pr. 9292 5050 9494 5555 9494 5959 9494 6666 9494 5656 EIEI 7171 74.574.5 76.576.5 8080 7575

미션임파셔블Mission Impossible LPDLPD IHDIHD LHLLHL HHDHHD LLRLLR S_C S _C S_M S _M S_F S _F S_M S _M S_F S _F S_M S _M S_F S _F S_M S _M S_F S _F S_M S _M S_F S _F 3030 00 66 1One 55 00 44 1One 88 00 1One re./pr.re./pr. 100100 8383 9797 8686 100100 8888 9797 7979 100100 9797 EIEI 9292 91.591.5 9494 8888 9999

표 2에서 샷 전환 프레임 검출파라미터별로 EI의 값이 최대가 되도록 실험적으로 임계치를 설정하였다. "매트릭스"의 경우 EI를 기준으로 인접한 전체 영역 색상 히스토그램 차이에 관한 특징변수(HHD)가 가장 좋은 성능을 발휘하는 반면, 인접한 두 프레임간의 전체 영역 픽셀 밝기 차이에 관한 특징변수(LPD)는 성능이 떨어짐을 알 수 있다. 또한 미션 임파서블은 밝기 성분을 이용한 전체 영역 블록의 유사성에 관한 특징변수(LLR)가 좋은 성능을 발휘하는 반면, 전체 영역 색상 히스토그램 차이에 관한 특징변수(HHD)은 성능이 떨어진다. 여기서 알 수 있듯이, 두 학습 동영상 중 미션 임파서블은 단일 기법만으로도 우수한 성능을 발휘한다. 하지만, 매트릭스는 단일 기법만으로는 뛰어난 성능을 발휘가 어렵다. In Table 2, threshold values were experimentally set to maximize the value of EI for each shot switching frame detection parameter. In the case of the "matrix", the feature variable (HHD) for the difference of adjacent full-area color histograms relative to EI performs the best, while the feature variable (LPD) for the difference in full-area pixel brightness between two adjacent frames performs the best. It can be seen that the fall. In addition, the Mission Impossible has a good performance in terms of the LLR of the similarity of the entire area block using the brightness component, whereas the HHD of the HDS of the full area color histogram is poor. As you can see, Mission Impossible is the best of the two training videos. However, the matrix does not perform well with a single technique.

도2를 참조하면, 두 학습 동영상 상에서 각 샷 전환 프레임 검출파라미터별로 검출되지 못한 샷 전환 프레임과 샷 전환으로 잘못 판정된 프레임을 확인할 수 있다. 각 샷 전환 프레임 검출파라미터를 살펴보면, HHD와 LHL에 의해 샷 전환점 A와 C는 정확하게 찾았지만, 검출파라미터에 따라 LPD는 C에서 검출되지 못한 샷 전환 프레임가 발생하였고, IHD는 D와 F에서 샷 전환으로 잘못 판정된 프레임이 발생하였다. 그리고 LLR은 E에서 샷 전환으로 잘못 판정된 프레임이 발생한다. Referring to FIG. 2, it is possible to identify shot change frames that are not detected for each shot change frame detection parameter and frames incorrectly determined as shot change on two learning videos. Looking at each shot transition frame detection parameter, the shot transition points A and C were correctly found by HHD and LHL, but according to the detection parameters, LPD generated a shot transition frame that was not detected in C, and IHD changed from D and F to shot transition. An incorrectly determined frame has occurred. The LLR then generates a frame that is incorrectly determined to be a shot transition at E.

이와 같이, 동일 시퀀스에 대하여 각각의 검출 기법을 독립적으로 실행하였을 때의 결과는 각 성능에 따라 상이한 결과가 나타나지만, 여러 개의 특징들을 동시에 적용하게 되면, 각 방식간의 관계(relation)가 나타남을 볼 수 있다. 예로, 두 개의 IHD와 LLR을 동시에 적용하였을 때, IHD에 의해 샷 전환으로 A, C, D, 그리고 F가 검출된다. 반면, LLR에 의해서는 A, C, 그리고 E에서 샷 전환이 검출된다. 즉, 두 방식을 이용하여 IHD를 먼저 적용한 후 LLR을 적용하게 되면, 샷 전환점인 A와 C를 정확하게 찾을 수 있을 것이다. In this way, when the detection methods are executed independently for the same sequence, the results are different according to each performance. However, when several features are applied at the same time, the relationship between the methods is shown. have. For example, when two IHD and LLRs are applied at the same time, A, C, D, and F are detected as shot transitions by the IHD. On the other hand, the LLR detects shot transitions at A, C, and E. In other words, if IHD is applied first and then LLR is applied using both methods, the shot transition points A and C will be accurately found.

이 후 각 샷 전환 프레임 검출파라미터별로 임계치가 설정되면, 샷 전환 프레임 검출파라미터와 샷 전환 프레임 검출파라미터별로 설정된 임계치를 이용하여 샷 전환 프레임을 찾기 위한 최적의 이진 분류 트리를 생성하여야 한다(S14). Thereafter, if a threshold is set for each shot change frame detection parameter, an optimal binary classification tree for finding a shot change frame should be generated using the shot change frame detection parameter and the threshold set for each shot change frame detection parameter (S14).

본 실시예에서 각 이진 분류 트리에는 분류결과 샷 전환이 아닌 프레임들의 수가 최대가 되도록 하는 샷 전환 프레임 검출파라미터 및 임계치가 순차적으로 설정되는 것이 바람직하다. In the present embodiment, it is preferable that the shot change frame detection parameter and the threshold value are set sequentially in each binary classification tree so that the number of frames other than the shot result shot change is maximized.

도3 을 참조하면, 첫 번째 이진 분류에는 샷 전환이 아닌 프레임들의 수를 2058개나 분류해 낼 수 있는 LHL이 놓인다. 이때의 임계 값은 0.165이다. 이 임계 값에 따라 샷 전환 프레임의 수는 0으로써 검출된다. 이어, 샷 전환이 아닌 프레임들인 2058을 제외한 나머지 109 개의 프레임들에 대해서 다른 샷 전환 프레임 검출 파라미터들을 이용하여 샷 전환 프레임인지 아닌지에 대해서 다시 판단을 한다. 이러한 판단을 위해, 다음 이진 분류 가지에서도 분류결과 샷 전환이 아닌 프레임들의 수가 최대가 되도록 하는 샷 전환 프레임 검출파라미터 및 임계치가 설정된다. 본 실시예에서는 LLR을 이용된다. 상기 LLR에 대한 임계치는 1.791로 설정되고, 이 임계치를 초과하지 않는 프레임들의 수는 26개인 것으로 판단된다. 다음 이진 분류 가지에서는 109개 중 26개를 제외한 83개의 프레임들에 대해 HHD가 이용된다. 이 HHD를 위한 임계 값은 0.075로 설정된다. 나머지 83개의 프레임들 중 이 0.075를 넘지 않는, 즉 샷 전환이 아닌 프레임들의 수는 7개로 판단된다. 다음,이진 분류 가지에는 83개의 프레임들 중, 7개를 제외한 나머지 76개의 프레임들을 나머지 특징 파라미터를 이용하여 샷 전환 프레임인지 아닌지 판단을 하여 샷 전환 프레임들의 수가 48개로 판단되고, nsc 프레임들의 수는 28개로 최종적으로 판단되지만, 다수결의 원칙에 따라 sc 클래스가 우위에 있으므로, 76개의 프레임들의 수는 sc 클래스로 분류된다. 상기 이진 분류 트리의 생성은 불순물이 발생하지 않거나 정지 조건(stopping criterion)을 만족할 때까지 최대 이진 분류 트리(maximal tree)로써 생성된다. Referring to Fig. 3, the first binary classification includes an LHL capable of classifying 2058 frames other than shot transitions. At this time, the threshold value is 0.165. According to this threshold, the number of shot change frames is detected as zero. Subsequently, other 109 frames other than 2058 which are not shot transition frames are used again to determine whether they are shot transition frames using other shot transition frame detection parameters. For this determination, shot transition frame detection parameters and thresholds are set in the next binary classification branch so that the number of frames other than the classification result shot transition is maximized. In this embodiment, LLR is used. The threshold for the LLR is set to 1.791 and it is determined that the number of frames not exceeding this threshold is 26. In the following binary classification branch, HHD is used for 83 frames except 26 out of 109. The threshold for this HHD is set to 0.075. Of the remaining 83 frames, the number of frames not exceeding 0.075, that is, not shot transition is determined to be seven. Next, in the binary classification branch, 76 frames other than 7 out of 83 frames are judged as shot change frames using the remaining feature parameters, and the number of shot change frames is determined to be 48, and the number of nsc frames is Although finally judged as 28, since the sc class is superior according to the majority rule, the number of 76 frames is classified as sc class. The generation of the binary classification tree is generated as a maximum binary classification tree until no impurities occur or a stopping criterion is satisfied.

샷 전환 프레임 검출파라미터별로 설정된 임계치를 이용하여 샷 전환 프레임을 찾기 위한 최적의 이진 분류 트리를 생성하였으면, 이를 이용하여 샷 전환 프레임을 모르는 임의의 동영상에서 샷 전환 프레임을 검출한다(S14).When the optimal binary classification tree for finding the shot change frame is generated using the threshold set for each shot change frame detection parameter, the shot change frame is detected from an arbitrary video having no known shot change frame (S14).

표5는 학습동영상으로 매트릭스를 사용하고 미션 임파서블에서 샷 전환 프레임 검출파라미터별로 특징 변화량을 획득한 후 도3과 같이 분류된 이진 분류 트리를 사용하여, 드롭케이스(drop-case)방식에 따른 노드별 통계(표 5a) 및 분류 결과(표 5b)를 나타낸 것이다. Table 5 uses the matrix as a learning video, acquires the feature change for each shot transition frame detection parameter in the mission impossible, and uses the binary classification tree classified as shown in FIG. Statistics (Table 5a) and classification results (Table 5b) are shown.

샷 전환 프레임 검출파라미터별 분류결과Classification result by shot change frame detection parameter 예측결과Forecast result 분류에 따른 프레임수Number of frames according to classification ResponseResponse ProbProb NN CorrectCorrect 노드Node 1One NSCNSC 0.9570.957 23242324 1.01.0 23242324 22 NSCNSC 0.0290.029 7070 1.01.0 7070 33 -- 00 00 00 00 44 SCSC 0.0140.014 3535 0.8570.857 3535

실제 분류 프레임클래스Actual classification frame class 예상 분류 프레임 클래스Expected Classified Frame Class 분류에 따른 프레임수Number of frames according to classification SCSC NSCNSC SCSC 3030 00 3030 NSCNSC 55 23942394 23942394 총계sum 3535 23942394 24292429 정확도accuracy 1.0001.000 0.9980.998 전체 정확도Overall accuracy 0.9990.999

표 6은 학습동영상으로 매트릭스를 사용하고 임의의 다큐멘터리 영상에서 샷 전환 프레임 검출파라미터별로 특징 변화량을 획득한 후 도3과 같이 분류된 이진 분류 트리를 사용하여, 드롭케이스(drop-case)방식에 따른 노드별 통계(표 6a) 및 분류 결과(표 6b)를 나타낸 것이다. Table 6 shows a drop-case method using a matrix as a learning video, using a binary classification tree classified as shown in FIG. The statistics by node (Table 6a) and classification results (Table 6b) are shown.

샷 전환 프레임 검출파라미터별 분류결과Classification result by shot change frame detection parameter 예측결과Forecast result 분류에 따른 프레임수Number of frames according to classification ResponseResponse ProbProb NN CorrectCorrect 노드Node 1One NSCNSC 0.8590.859 23312331 1.01.0 23312331 22 NSCNSC 0.1270.127 345345 0.9910.991 345345 33 -- 00 00 00 00 44 SCSC 0.0150.015 4141 0.8290.829 4141

실제 분류 프레임클래스Actual classification frame class 예상 분류 프레임 클래스Expected Classified Frame Class 분류에 따른 프레임수Number of frames according to classification SCSC NSCNSC SCSC 3434 33 3737 NSCNSC 77 26722672 26792679 총계sum 4141 26752675 27162716 정확도accuracy 0.9190.919 0.9970.997 전체 정확도Overall accuracy 0.9580.958

표 5a와 표 6a는 도 3의 이진 분류 트리에 시험 샘플 데이터를 하나씩 드롭(drop)하여 각 최종 노드에서 수집된 데이터를 나타낸다. 각 노드별로 수집된 정보를 통계적으로 분류하기 위해 네 가지 항목으로 분석한다. 첫 번째로 "Response" 는 각 노드별로 수집된 데이터의 예측 클래스를 나타낸다. 두 번째 "Prob(probability)"는 전체 데이터 샘플 중 각 노드에 수집된 데이터의 발생 빈도를 나타낸다. 세 번째 항목 "N"은 각 노드에 수집된 샘플 수를 의미하고, 마지막 항목 "Correct"는 수집된 데이터를 클래스별로 나누었을 때, 클래스 순수도(purity)를 측정하는 것이다. 위의 표 5a를 이용하여 각 항목을 살펴보면, 노드 1에 수집된 수(N)는 2324개이고, 이 결과 중 nsc 클래스에 속하는 것이 2324개로 "Correct"가 1.0임을 나타낸다. 따라서, 노드 1의 "Response"는 nsc 클래스로 예측한다는 것을 나타낸다. 또한, "Prob"는 전체 프레임 수가 2429중 2324개가 노드 1에서 분류된 것이므로, 0.957의 결과가 산출되었다. 노드 2에서 수집된 수(N)는 70개이고, "Correct"가 1.0이고, 역시 nsc 클래스로 예측한다. 노드 3은 수집된 수가 0개이므로 클래스를 예측할 수 없는 것으로 나타내고, 나머지 노드 4는 수집된 수가 35개이며, 정확도가 0.857로 sc 클래스로 예측한다.Tables 5a and 6a show data collected at each final node by dropping test sample data one by one into the binary classification tree of FIG. 3. Four categories are analyzed to classify the information collected by each node statistically. Firstly, "Response" represents a prediction class of data collected for each node. The second "Prob (probability)" represents the frequency of occurrence of data collected at each node of the entire data sample. The third item "N" refers to the number of samples collected at each node, and the last item "Correct" measures class purity when the collected data are divided by class. Looking at each item using Table 5a above, the number (N) collected in node 1 is 2324, and 2324 of the results belong to nsc class, indicating that "Correct" is 1.0. Thus, "Response" of node 1 indicates that it predicts with class nsc. Further, "Prob" is that 2324 of the total number of frames in 2429 are classified at node 1, and a result of 0.957 was calculated. The number N collected at node 2 is 70, "Correct" is 1.0, and is also predicted by the class nsc. Node 3 indicates that the class is unpredictable because there are zero collected, while the remaining node 4 has 35 collected and predicted as sc with an accuracy of 0.857.

표 5b와 표 6b는 각 표5a와 표6a에 의해 분류된 결과를 샷 전환(sc) 프레임과 샷 전환이 아닌(nsc) 프레임 클래스별로 정리한 결과이다. 샷 전환(sc) 프레임과 샷 전환이 아닌(nsc) 프레임 클래스별 분류 정확도(Correct)를 계산하였다. 따라서 미션 임파서블의 경우 샷 전환(sc) 프레임과 샷 전환이 아닌(nsc) 프레임 클래스 분류 능력은 각각 1.0과 0.998이고, 전체 정확성은 0.999이다. 그리고, 다큐멘터리 영상은 0.919, 0.997, 그리고 0.958이 계산되었다. 이는 교차 검증 결과인 샷 전환(sc) 프레임과 샷 전환이 아닌(nsc) 프레임 클래스 각각 89.6%와 98.7%의 분류 확률보다 높다. 따라서, 이진 분류 트리가 트리 생성 과정에 전혀 개입하지 않은 샘플 시퀀스에 대해서도 샷 전환점을 잘 검출하는 것을 볼 수 있다. 위의 실험을 통하여 단일 특징에 의한 결과와 본 발명에서 제안한 방식의 성능을 표 7에서 비교하였다. Tables 5b and 6b summarize the results classified by Tables 5a and 6a by shot class (sc) frame and non-shot transition (nsc) frame class. The classification accuracy (Correct) for each frame of shot transition (sc) and non-shot transition (nsc) was calculated. Therefore, in the case of mission impossible, the frame class classification capability of the shot transition (sc) frame and the non-shot transition (nsc) is 1.0 and 0.998, respectively, and the overall accuracy is 0.999. Documentary images of 0.919, 0.997, and 0.958 were calculated. This is higher than the classification probability of 89.6% and 98.7%, respectively, of the shot transition (sc) frame and the non-shot transition (nsc) frame class, which are the cross-validation results. Thus, it can be seen that the binary classification tree detects shot transition points well even for sample sequences that do not intervene in the tree generation process. Through the above experiment, the results of the single feature and the performance of the proposed method are compared in Table 7.

샷 전환 프레임 검출파라미터별 분류결과Classification result by shot change frame detection parameter 이진 분류 결과Binary classification result 구분division recallrecall precisionprecision EIEI recallrecall precisionprecision EIEI 매트릭스matrix HHDHHD 9494 6666 8080 100100 6363 81.581.5 미션임파서블mission Impossible LLRLLR 100100 9797 98.598.5 100100 9999 99.599.5 다큐멘터리documentary LHLLHL 100100 8686 9393 9292 9999 95.595.5

표 7에서 EI의 경우 평균 2% 성능 향상을 나타냄을 알 수 있다. 하지만, 제미션 임파서블의 경우 EI의 성능은 비슷하지만 precision의 경우에 본 발명에서 제안한 방식이 개선되었음을 알 수 있다.In Table 7, it can be seen that the average performance improvement of 2% for EI. However, in the case of the Mission Impossible, the performance of the EI is similar, but the precision proposed in the present invention is improved in the case of precision.

이상에서와 같이 본 발명에서는 임의의 동영상에 쉽게 적용할 수 있도록 다수의 샷 전환 프레임 검출 파라미터를 사용하고 이진 분류 트리를 생성함으로써, 기존에 제안된 키 프레임 검출기술들을 사용하여 다양한 카메라 촬영 및 편집방법에 의해 제작된 동영상에서 샷 전환 프레임을 검출할 경우 그 검출성능에 대해 사용자가 신뢰할 수 있는 효과가 있다.As described above, in the present invention, by using a plurality of shot change frame detection parameters and generating a binary classification tree so as to be easily applied to any video, various camera shooting and editing methods using existing key frame detection techniques are proposed. When the shot change frame is detected in the video produced by the user, the user can have a reliable effect on the detection performance.

본 발명은 첨부된 도면을 참조하여 바람직한 실시예를 중심으로 기술되었지만 당업자라면 이러한 기재로부터 본 발명의 범주를 벗어남이 없이 많은 다양한 자명한 변형이 가능하다라는 것은 명백하다. 따라서, 이러한 많은 변형예들을 포함하도록 기술된 특허청구범위에 의해서 해석되어져야 할 것이다.Although the present invention has been described with reference to the accompanying drawings, it will be apparent to those skilled in the art that many various obvious modifications are possible without departing from the scope of the invention from this description. Therefore, it should be interpreted by the claims described to include many such variations.

도 1 은 본 발명에 따른 동영상에서 샷 전환 프레임을 검출하는 과정을 도시한다.1 illustrates a process of detecting a shot change frame in a video according to the present invention.

도 2 는 본 발명에 따른 동영상에서 샷 전환 프레임 검출파라미터별로 검출되지 못한 샷 전환 프레임과 샷 전환으로 잘못 판정된 프레임의 예를 도시한 그래프이다.2 is a graph illustrating an example of a shot change frame that is not detected for each shot change frame detection parameter and a frame that is incorrectly determined as shot change in a video according to the present invention.

도 3 은 본 발명의 일시시예에 따른 이진 분류 트리를 도시한 것이다. 3 illustrates a binary classification tree according to a temporary embodiment of the present invention.

Claims

Acquiring a characteristic change amount between two adjacent frames according to a plurality of shot change frame detection parameters in the training video;

Setting a threshold to maximize the number of shot change frames for each shot change frame detection parameter;

Generating a binary classification tree using the plurality of shot change frame detection parameters and thresholds set for each detection parameter;

Detecting a shot change frame in the test video by using the generated binary classification tree;

Shot switching frame detection method comprising a.

The method of claim 1, wherein acquiring a feature change amount between two adjacent frames according to the plurality of shot change frame detection parameters comprises:

A feature variable (LPD) for the total area pixel brightness difference between two adjacent frames, a feature variable (IHD) for the total area pixel brightness histogram difference, a feature variable (LHL) for the local area pixel brightness histogram difference, and global color And a plurality of shot change frame detection parameters including a feature variable (HHD) related to the histogram difference and a feature variable (LLR) related to the similarity of global blocks using brightness components.

The method of claim 1 or 2, wherein the step of generating a binary classification tree:

And a shot change frame detection parameter and a threshold value are set sequentially in each binary classification tree so that the number of frames other than the classification result shot change is maximized.