KR101833943B1

KR101833943B1 - Method and system for extracting and searching highlight image

Info

Publication number: KR101833943B1
Application number: KR1020160108637A
Authority: KR
Inventors: 김상욱; 민재식; 조성철; 박대현; 김봉섭
Original assignee: 네이버 주식회사
Priority date: 2016-08-25
Filing date: 2016-08-25
Publication date: 2018-04-13

Abstract

Disclosed are a method and a system for extracting and searching a highlight scene of a video based on context. The method, which is realized by a computer, comprises the steps of: selecting a candidate frame from a video; calculating a candidate value indicating feature information of images for each of the candidate frames; and extracting some candidate frames as representative images of the video based on the candidate value.

Description

METHOD AND SYSTEM FOR EXTRACTING AND SEARCHING HIGHLIGHT IMAGE < RTI ID = 0.0 >

아래의 설명은 동영상의 주요 장면을 제공하는 기술에 관한 것이다.The following description relates to a technique for providing the main scene of a moving picture.

초고속 통신망 이용자의 급격한 증가는 통신망을 통한 신규 서비스의 개발 및 서비스 아이템의 다양화를 가능하게 한다. 이러한 통신망을 이용한 서비스 중 가장 일반적인 서비스가 동영상 제공 서비스라 할 수 있다.The rapid increase of users of broadband networks enables the development of new services and diversification of service items through the network. One of the most common services using such a communication network is a video providing service.

동영상을 구성하는 영상 프레임 일부를 추출하여 해당 동영상의 요약 정보로서 활용하고 있다. 영상 프레임을 추출하는 예로서는, 동영상의 첫 번째 영상 프레임 혹은 마지막 영상 프레임을 사용하는 방법 등이 있으며, 동영상의 시간의 흐름에 따른 변화를 나타내기 위하여 하나의 영상 프레임이 아니라 복수 개의 영상 프레임을 이용하기도 한다.A part of video frames constituting a video is extracted and utilized as summary information of the video. As an example of extracting an image frame, there is a method of using a first image frame or a last image frame of a moving image. In order to express a change with time of a moving image, a plurality of image frames are used instead of one image frame do.

일례로, 한국등록특허 제10-0547370호(등록일 2006년 01월 20일) "객체 형상 정보를 이용한 요약 영상 추출 장치 및 그 방법과 그를 이용한 동영상 요약 및 색인 시스템"에는 동영상에서 영상 객체의 일련의 변화하는 형상 및 위치를 추출하여 이를 대표 프레임으로 활용하는 기술이 개시되어 있다.For example, Korean Patent No. 10-0547370 (filed on Jan. 20, 2006) entitled " Summary Image Extracting Apparatus and Method Using Object Context Information, and Video Summarization and Index System Using the Same " A technique of extracting a changing shape and position and utilizing it as a representative frame is disclosed.

동영상의 콘텍스트(context)를 기반으로 주요 장면을 추출할 수 있는 방법 및 시스템을 제공한다.The present invention provides a method and system for extracting a main scene based on a context of a moving picture.

동영상의 콘텍스트 기반의 주요 장면을 장면 탐색 기능으로 제공하여 동영상의 전체적인 내용과 흐름을 쉽게 파악할 수 있도록 하는 방법 및 시스템을 제공한다.The present invention provides a method and system for providing context-based key scenes of a moving picture as a scene search function so that the overall contents and flow of a moving picture can be easily grasped.

컴퓨터로 구현되는 방법에 있어서, 동영상에서 후보 프레임을 선택하는 단계; 상기 후보 프레임 각각에 대하여 해당 영상의 특징 정보를 나타내는 후보 값을 계산하는 단계; 및 상기 후보 값을 기준으로 상기 후보 프레임 중 일부 프레임을 상기 동영상의 대표 이미지로 추출하는 단계를 포함하는 방법을 제공한다.A computer-implemented method comprising: selecting a candidate frame in a moving picture; Calculating a candidate value indicating feature information of the image for each of the candidate frames; And extracting, as a representative image of the moving image, some of the candidate frames based on the candidate value.

일 측면에 따르면, 상기 계산하는 단계는, 상기 후보 프레임 각각에 대하여 인접한 프레임과의 시간 간격을 나타내는 값을 이용하여 상기 후보 값을 계산할 수 있다.According to an aspect of the present invention, the calculating step may calculate the candidate value using a value indicating a time interval between adjacent frames for each of the candidate frames.

다른 측면에 따르면, 상기 시간 간격을 나타내는 값은 프레임 간 시간 간격이 길수록 큰 값이 부여되고, 상기 추출하는 단계는, 상기 후보 값을 기준으로 상위 일정 개수의 후보 프레임 또는 상기 후보 값이 일정 값 이상인 후보 프레임을 상기 대표 이미지로 추출할 수 있다.According to another aspect of the present invention, the value indicating the time interval is given a larger value as the inter-frame time interval is longer, and the extracting step may further include the step of extracting a predetermined number of candidate frames or candidate values The candidate frame can be extracted as the representative image.

또 다른 측면에 따르면, 상기 계산하는 단계는, 상기 후보 프레임 각각에 대하여 이미지 품질을 나타내는 값과 영상에 포함된 자막의 특징 정보를 나타내는 값 중 적어도 하나를 더 이용하여 상기 후보 값을 계산할 수 있다.According to another aspect of the present invention, the calculating step may calculate the candidate value by further using at least one of a value indicating image quality and a value indicating feature information of a caption included in an image for each of the candidate frames.

또 다른 측면에 따르면, 상기 계산하는 단계는, 상기 후보 프레임 각각에 대하여 문자열이 존재하는 자막 영역을 검출한 후 상기 자막 영역에 대한 특징 값을 이용하여 상기 후보 값을 계산할 수 있다.According to another aspect of the present invention, the calculating step may calculate the candidate value using a feature value of the caption region after detecting a caption region in which a character string exists in each of the candidate frames.

또 다른 측면에 따르면, 상기 자막 영역을 검출하는 것은, 상기 후보 프레임에 대해 ML-LBP(Multi Block Local Binary Pattern)을 이용하여 상기 자막 영역을 검출할 수 있다.According to another aspect of the present invention, detecting the caption area may detect the caption area using ML-LBP (Multi-Block Local Binary Pattern) for the candidate frame.

또 다른 측면에 따르면, 상기 자막 영역을 검출하는 것은, 상기 후보 프레임에 대해 ML-LBP(Multi Block Local Binary Pattern)을 이용하여 서로 다른 값을 가지는 LBP의 개수와 픽셀 값이 임계치 이상인 픽셀의 개수에 따른 LBP 특징점 가중치를 기준으로 상기 자막 영역을 검출할 수 있다.According to another aspect of the present invention, the detection of the caption area may include detecting the number of LBPs having different values and the number of pixels whose pixel values are equal to or larger than a threshold value using ML-LBP (Multi Block Local Binary Pattern) It is possible to detect the caption area on the basis of the LBP feature point weights.

또 다른 측면에 따르면, 상기 선택하는 단계는, 상기 동영상에서 키 프레임(key frame) 또는 일정 시간 간격의 프레임을 추출하는 단계; 상기 추출된 프레임 각각에 대하여 프레임 간의 장면 변화 값을 계산하는 단계; 및 상기 장면 변화 값을 기준으로 상기 추출된 프레임 중 적어도 일부 프레임을 상기 후보 프레임으로 선택하는 단계를 포함할 수 있다.According to another aspect of the present invention, the selecting includes: extracting a key frame or a frame at a predetermined time interval from the moving image; Calculating a scene change value between frames for each of the extracted frames; And selecting at least some of the extracted frames as the candidate frame based on the scene change value.

또 다른 측면에 따르면, 상기 대표 이미지를 이용하여 상기 동영상의 장면 탐색 기능을 제공하는 단계를 더 포함할 수 있다.According to another aspect, the method may further include providing a scene search function of the moving image using the representative image.

또 다른 측면에 따르면, 상기 장면 탐색 기능을 제공하는 단계는, 상기 대표 이미지를 상기 구간 이동을 위한 썸네일(thumbnail)로 구성할 수 있다.According to another aspect of the present invention, the step of providing the scene search function may include the representative image as a thumbnail for the segment movement.

주요 장면 제공 방법을 실행시키기 위해 컴퓨터로 읽을 수 있는 매체에 저장된 컴퓨터 프로그램에 있어서, 상기 주요 장면 제공 방법은, 동영상에서 후보 프레임을 선택하는 단계; 상기 후보 프레임 각각에 대하여 해당 영상의 특징 정보를 나타내는 후보 값을 계산하는 단계; 상기 후보 값을 기준으로 상기 후보 프레임 중 일부 프레임을 상기 동영상의 대표 이미지로 추출하는 단계; 및 상기 대표 이미지를 이용하여 상기 동영상의 장면 탐색 기능을 제공하는 단계를 포함하는, 컴퓨터로 읽을 수 있는 매체에 저장된 컴퓨터 프로그램을 제공한다.A computer program stored in a computer-readable medium for executing a method for providing a main scene, the method comprising: selecting a candidate frame from a moving picture; Calculating a candidate value indicating feature information of the image for each of the candidate frames; Extracting some of the candidate frames as a representative image of the moving image based on the candidate value; And providing a scene search function of the moving picture using the representative image. The present invention also provides a computer program stored in a computer-readable medium.

컴퓨터로 구현되는 시스템에 있어서, 동영상에서 후보 프레임을 선택하는 후보 선택부; 및 상기 후보 프레임 각각에 대하여 해당 영상의 특징 정보를 나타내는 후보 값을 계산한 후 상기 후보 값을 기준으로 상기 후보 프레임 중 일부 프레임을 상기 동영상의 대표 이미지로 추출하는 대표 추출부를 포함하는 것을 특징으로 하는 시스템을 제공한다.A computer-implemented system comprising: a candidate selector for selecting a candidate frame in a moving picture; And a representative extraction unit for calculating a candidate value indicating feature information of the image for each of the candidate frames and extracting a frame of the candidate frames as a representative image of the moving image based on the candidate value. System.

본 발명의 실시예들에 따르면, 동영상의 콘텍스트(context)를 기반으로 실질적인 주요 장면을 추출할 수 있다.According to embodiments of the present invention, it is possible to extract a substantial main scene based on the context of a moving picture.

본 발명의 실시예들에 따르면, 동영상의 콘텍스트 기반의 주요 장면을 장면 탐색 기능으로 제공함으로써 동영상의 전체적인 내용과 흐름을 쉽게 파악할 수 있다.According to the embodiments of the present invention, the main content of the moving picture and the flow of the moving picture can be easily grasped by providing the main scene based on the context of the moving picture as the scene searching function.

도 1은 본 발명의 일 실시예에 있어서 컴퓨터 시스템의 내부 구성의 일례를 설명하기 위한 블록도이다.
도 2는 본 발명의 일 실시예에 따른 컴퓨터 시스템의 프로세서가 포함할 수 있는 구성요소의 예를 도시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 컴퓨터 시스템이 수행할 수 있는 주요 장면 제공 방법의 예를 도시한 순서도이다.
도 4는 본 발명의 일 실시예에 있어서 영상에서 주요 영역을 설정하는 과정을 설명하기 위한 예시 도면이다.
도 5 내지 도 9는 본 발명의 일 실시예에 있어서 이미지 품질 측정 항목을 설명하기 위한 예시 도면이다.
도 10은 본 발명의 일 실시예에 있어서 자막 특징 값을 산출하는 일련의 과정을 도시한 순서도이다.
도 11 내지 도 14는 본 발명의 일 실시예에 있어서 영상에서 자막 영역을 검출하는 과정을 설명하기 위한 예시 도면이다.1 is a block diagram for explaining an example of the internal configuration of a computer system according to an embodiment of the present invention.
2 is a diagram illustrating an example of components that a processor of a computer system according to an embodiment of the present invention may include.
3 is a flowchart illustrating an example of a main scene providing method that can be performed by a computer system according to an embodiment of the present invention.
4 is an exemplary diagram for explaining a process of setting a main area in an image according to an embodiment of the present invention.
5 to 9 are exemplary diagrams for explaining image quality measurement items in an embodiment of the present invention.
FIG. 10 is a flowchart illustrating a series of processes for calculating caption feature values according to an exemplary embodiment of the present invention.
11 to 14 are exemplary diagrams for explaining a process of detecting a caption area in an image according to an embodiment of the present invention.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명의 실시예들은 동영상의 주요 장면을 제공하는 기술에 관한 것으로, 더욱 상세하게는 동영상의 콘텍스트에 기반하여 주요 장면을 추출할 수 있고 추출된 주요 장면을 장면 탐색 기능으로 제공할 수 있는 방법 및 시스템에 관한 것이다.The present invention relates to a technique for providing a main scene of a moving picture, and more particularly to a method for extracting a main scene based on a context of a moving picture and providing an extracted main scene to a scene searching function, &Lt; / RTI >

본 명세서에서 구체적으로 개시되는 것들을 포함하는 실시예들은 콘텍스트 기반 주요 장면 추출을 달성하고 이를 통해 효율성, 편의성, 다양성, 정확성, 비용 절감 등의 측면에 있어서 상당한 장점들을 달성한다.Embodiments, including those specifically disclosed herein, achieve context-based key scene extractions and thereby achieve significant advantages in terms of efficiency, convenience, diversity, accuracy, cost savings, and the like.

본 발명은 동영상의 콘텍스트를 인지하는 여러 가지 알고리즘을 사용하여 동영상에서 주요 장면을 추출하고 이를 장면 탐색 기능으로 제공하고자 하는 것이다.The present invention is to extract a main scene from a moving picture using various algorithms for recognizing the context of a moving picture and to provide it as a scene searching function.

동영상 플레이어의 기능 중 하나인 장면 탐색 기능은 동영상의 주요 장면을 쉽고 빠르게 이동할 수 있는 기능을 제공한다. 이러한 장면 탐색 기능을 이용함으로써 사용자는 장면 변화(scene change)가 발생하는 주요 장면을 위주로 동영상의 장면 탐색이 가능하며 이를 통해 동영상의 전체적인 내용을 쉽게 파악할 수 있다.One of the functions of the video player is the scene search function, which allows you to quickly and easily move the main scene of the movie. By using the scene search function, the user can search the scene of the video mainly for the main scene in which scene change occurs, and can easily grasp the overall contents of the video through the scene search function.

기존에는 장면 탐색 기능을 위해 시분할 기준 장면 추출 알고리즘을 이용하였으나, 시분할 기준 장면 추출 알고리즘은 일정 간격으로 기계적인 장면 추출 방식이기 때문에 동영상의 콘텍스트에 따른 중요한 장면으로의 이동이 불가능하다.Conventionally, the time-based reference scene extraction algorithm is used for the scene search function. However, because the time-based reference scene extraction algorithm is a mechanical scene extraction method at regular intervals, it is impossible to move to an important scene according to the context of the video.

이와 달리, 동영상의 콘텍스트를 기반으로 하는 장면 탐색 기능의 경우 영상의 특징 정보에 의한 가중치에 따라 주요 장면을 추출하여 장면 탐색 기능으로 제공하기 때문에 동영상의 전체 흐름을 반영할 수 있다.In contrast, in the case of the scene search function based on the context of the moving picture, the main scene is extracted according to the weight based on the feature information of the image and is provided as a scene search function, so that the entire flow of the moving picture can be reflected.

도 1은 본 발명의 일 실시예에 있어서 컴퓨터 시스템의 내부 구성의 일례를 설명하기 위한 블록도이다. 예를 들어, 본 발명의 실시예들에 따른 주요 장면 제공 시스템이 도 1의 컴퓨터 시스템(100)을 통해 구현될 수 있다. 도 1에 도시한 바와 같이, 컴퓨터 시스템(100)은 주요 장면 제공 방법을 실행하기 위한 구성요소로서 프로세서(110), 메모리(120), 영구 저장 장치(130), 버스(140), 입출력 인터페이스(150) 및 네트워크 인터페이스(160)를 포함할 수 있다.1 is a block diagram for explaining an example of the internal configuration of a computer system according to an embodiment of the present invention. For example, a main scene provision system according to embodiments of the present invention may be implemented through the computer system 100 of FIG. 1, the computer system 100 includes a processor 110, a memory 120, a persistent storage 130, a bus 140, an input / output interface 150 and a network interface 160.

프로세서(110)는 명령어들의 시퀀스를 처리할 수 있는 임의의 장치를 포함하거나 그의 일부일 수 있다. 프로세서(110)는 예를 들어 컴퓨터 프로세서, 이동 장치 또는 다른 전자 장치 내의 프로세서 및/또는 디지털 프로세서를 포함할 수 있다. 프로세서(110)는 예를 들어, 서버 컴퓨팅 디바이스, 서버 컴퓨터, 일련의 서버 컴퓨터들, 서버 팜, 클라우드 컴퓨터, 컨텐츠 플랫폼, 이동 컴퓨팅 장치, 스마트폰, 태블릿, 셋톱 박스, 미디어 플레이어 등에 포함될 수 있다. 프로세서(110)는 버스(140)를 통해 메모리(120)에 접속될 수 있다.Processor 110 may include or be part of any device capable of processing a sequence of instructions. The processor 110 may comprise, for example, a processor and / or a digital processor within a computer processor, a mobile device, or other electronic device. The processor 110 may be, for example, a server computing device, a server computer, a series of server computers, a server farm, a cloud computer, a content platform, a mobile computing device, a smart phone, a tablet, a set top box, The processor 110 may be connected to the memory 120 via a bus 140.

메모리(120)는 컴퓨터 시스템(100)에 의해 사용되거나 그에 의해 출력되는 정보를 저장하기 위한 휘발성 메모리, 영구, 가상 또는 기타 메모리를 포함할 수 있다. 메모리(120)는 예를 들어 랜덤 액세스 메모리(RAM: random access memory) 및/또는 동적 RAM(DRAM: dynamic RAM)을 포함할 수 있다. 메모리(120)는 컴퓨터 시스템(100)의 상태 정보와 같은 임의의 정보를 저장하는 데 사용될 수 있다. 메모리(120)는 예를 들어 주요 장면 제공을 위한 명령어들을 포함하는 컴퓨터 시스템(100)의 명령어들을 저장하는 데에도 사용될 수 있다. 컴퓨터 시스템(100)은 필요에 따라 또는 적절한 경우에 하나 이상의 프로세서(110)를 포함할 수 있다.The memory 120 may include volatile memory, permanent, virtual or other memory for storing information used by or output by the computer system 100. Memory 120 may include, for example, random access memory (RAM) and / or dynamic RAM (DRAM). The memory 120 may be used to store any information, such as the state information of the computer system 100. The memory 120 may also be used to store instructions of the computer system 100 including, for example, instructions for providing a main scene. Computer system 100 may include one or more processors 110 as needed or where appropriate.

버스(140)는 컴퓨터 시스템(100)의 다양한 컴포넌트들 사이의 상호작용을 가능하게 하는 통신 기반 구조를 포함할 수 있다. 버스(140)는 예를 들어 컴퓨터 시스템(100)의 컴포넌트들 사이에, 예를 들어 프로세서(110)와 메모리(120) 사이에 데이터를 운반할 수 있다. 버스(140)는 컴퓨터 시스템(100)의 컴포넌트들 간의 무선 및/또는 유선 통신 매체를 포함할 수 있으며, 병렬, 직렬 또는 다른 토폴로지 배열들을 포함할 수 있다.The bus 140 may comprise a communication infrastructure that enables interaction between the various components of the computer system 100. The bus 140 may, for example, carry data between components of the computer system 100, for example, between the processor 110 and the memory 120. The bus 140 may comprise a wireless and / or wired communication medium between the components of the computer system 100 and may include parallel, serial, or other topology arrangements.

영구 저장 장치(130)는 (예를 들어, 메모리(120)에 비해) 소정의 연장된 기간 동안 데이터를 저장하기 위해 컴퓨터 시스템(100)에 의해 사용되는 바와 같은 메모리 또는 다른 영구 저장 장치와 같은 컴포넌트들을 포함할 수 있다. 영구 저장 장치(130)는 컴퓨터 시스템(100) 내의 프로세서(110)에 의해 사용되는 바와 같은 비휘발성 메인 메모리를 포함할 수 있다. 영구 저장 장치(130)는 예를 들어 플래시 메모리, 하드 디스크, 광 디스크 또는 다른 컴퓨터 판독 가능 매체를 포함할 수 있다.The persistent storage device 130 may be a component such as a memory or other persistent storage device as used by the computer system 100 to store data for a predetermined extended period of time (e.g., as compared to the memory 120) Lt; / RTI > The persistent storage device 130 may include non-volatile main memory as used by the processor 110 in the computer system 100. The persistent storage device 130 may include, for example, flash memory, hard disk, optical disk, or other computer readable medium.

입출력 인터페이스(150)는 키보드, 마우스, 음성 명령 입력, 디스플레이 또는 다른 입력 또는 출력 장치에 대한 인터페이스들을 포함할 수 있다. 구성 명령들 및/또는 동영상 제공을 위한 입력이 입출력 인터페이스(150)를 통해 수신될 수 있다.The input / output interface 150 may include a keyboard, a mouse, voice command inputs, displays, or interfaces to other input or output devices. Configuration commands and / or input for providing a moving picture may be received via the input / output interface 150.

네트워크 인터페이스(160)는 근거리 네트워크 또는 인터넷과 같은 네트워크들에 대한 하나 이상의 인터페이스를 포함할 수 있다. 네트워크 인터페이스(160)는 유선 또는 무선 접속들에 대한 인터페이스들을 포함할 수 있다. 구성 명령들 및/또는 동영상과 관련된 서비스나 컨텐츠는 네트워크 인터페이스(160)를 통해 수신될 수 있다.The network interface 160 may include one or more interfaces to networks such as a local area network or the Internet. The network interface 160 may include interfaces for wired or wireless connections. Configuration commands and / or services or content associated with the video may be received via the network interface 160.

또한, 다른 실시예들에서 컴퓨터 시스템(100)은 도 1의 구성요소들보다 더 많은 구성요소들을 포함할 수도 있다. 그러나, 대부분의 종래기술적 구성요소들을 명확하게 도시할 필요성은 없다. 예를 들어, 컴퓨터 시스템(100)은 상술한 입출력 인터페이스(150)와 연결되는 입출력 장치들 중 적어도 일부를 포함하도록 구현되거나 또는 트랜시버(transceiver), GPS(Global Positioning System) 모듈, 카메라, 각종 센서, 데이터베이스 등과 같은 다른 구성요소들을 더 포함할 수도 있다. 보다 구체적인 예로, 컴퓨터 시스템(100)이 스마트폰과 같은 모바일 기기의 형태로 구현되는 경우, 일반적으로 스마트폰이 포함하고 있는 가속도 센서나 자이로 센서, 카메라, 각종 물리적인 버튼, 터치패널을 이용한 버튼, 입출력 포트, 진동을 위한 진동기 등의 다양한 구성요소들이 컴퓨터 시스템(100)에 더 포함되도록 구현될 수 있다.
Also, in other embodiments, the computer system 100 may include more components than the components of FIG. However, there is no need to clearly illustrate most prior art components. For example, the computer system 100 may be implemented to include at least some of the input / output devices connected to the input / output interface 150 described above, or may include a transceiver, a Global Positioning System (GPS) module, Databases, and the like. More specifically, when the computer system 100 is implemented in the form of a mobile device such as a smart phone, an acceleration sensor, a gyro sensor, a camera, various physical buttons, buttons using a touch panel, An input / output port, a vibrator for vibration, and the like may be further included in the computer system 100.

도 2는 본 발명의 일 실시예에 따른 컴퓨터 시스템의 프로세서가 포함할 수 있는 구성요소의 예를 도시한 도면이고, 도 3은 본 발명의 일 실시예에 따른 컴퓨터 시스템이 수행할 수 있는 주요 장면 제공 방법의 예를 도시한 순서도이다.FIG. 2 is a diagram illustrating an example of a component that a processor of a computer system according to an exemplary embodiment of the present invention may include; FIG. 3 is a diagram illustrating a main scene Fig. 2 is a flowchart showing an example of a method of providing a service;

도 2에 도시된 바와 같이, 프로세서(110)는 후보 선택부(210), 대표 추출부(220), 및 장면 탐색부(230)를 포함할 수 있다. 이러한 프로세서(110)의 구성요소들은 적어도 하나의 프로그램 코드에 의해 제공되는 제어 명령에 따라 프로세서(110)에 의해 수행되는 서로 다른 기능들(different functions)의 표현들일 수 있다. 예를 들어, 프로세서(110)가 동영상에서 후보 프레임을 선택하도록 컴퓨터 시스템(100)을 제어하기 위해 동작하는 기능적 표현으로서 후보 선택부(210)가 사용될 수 있다. 프로세서(110) 및 프로세서(110)의 구성요소들은 도 3의 주요 장면 제공 방법이 포함하는 단계들(S310 내지 S340)을 수행할 수 있다. 예를 들어, 프로세서(110) 및 프로세서(110)의 구성요소들은 메모리(120)가 포함하는 운영체제의 코드와 상술한 적어도 하나의 프로그램 코드에 따른 명령(instruction)을 실행하도록 구현될 수 있다. 여기서 적어도 하나의 프로그램 코드는 상기 주요 장면 제공 방법을 처리하기 위해 구현된 프로그램의 코드에 대응될 수 있다.2, the processor 110 may include a candidate selecting unit 210, a representative extracting unit 220, and a scene search unit 230. The components of such a processor 110 may be representations of different functions performed by the processor 110 in accordance with control commands provided by at least one program code. For example, the candidate selector 210 may be used as a functional representation that the processor 110 operates to control the computer system 100 to select a candidate frame in a motion picture. The components of the processor 110 and the processor 110 may perform the steps S310 to S340 included in the main scene providing method of FIG. For example, the components of processor 110 and processor 110 may be implemented to execute instructions in accordance with the at least one program code described above and the code of the operating system that memory 120 contains. Wherein at least one program code may correspond to a code of a program implemented to process the main scene providing method.

주요 장면 제공 방법은 도시된 순서대로 발생하지 않을 수 있으며, 단계들 중 일부가 생략되거나 추가의 과정이 더 포함될 수 있다.The main scene providing method may not occur in the order shown, and some of the steps may be omitted or an additional process may be further included.

단계(S310)에서 프로세서(110)는 주요 장면 제공 방법을 위한 프로그램 파일에 저장된 프로그램 코드를 메모리(120)에 로딩할 수 있다. 예를 들어, 주요 장면 제공 방법을 위한 프로그램 파일은 도 1을 통해 설명한 영구 저장 장치(130)에 저장되어 있을 수 있고, 프로세서(110)는 버스를 통해 영구 저장 장치(130)에 저장된 프로그램 파일로부터 프로그램 코드가 메모리(120)에 로딩되도록 컴퓨터 시스템(110)을 제어할 수 있다.In step S310, the processor 110 may load the program code stored in the program file for the main scene providing method into the memory 120. [ For example, a program file for a method of providing a main scene may be stored in the persistent storage device 130 described with reference to FIG. 1, and the processor 110 may receive a program file stored in the persistent storage device 130 The computer system 110 may be controlled such that the program code is loaded into the memory 120. [

이때, 프로세서(110) 및 프로세서(110)가 포함하는 후보 선택부(210), 대표 추출부(220), 및 장면 탐색부(230) 각각은 메모리(120)에 로딩된 프로그램 코드 중 대응하는 부분의 명령을 실행하여 이후 단계들(S320 내지 S340)을 실행하기 위한 프로세서(110)의 서로 다른 기능적 표현들일 수 있다. 단계들(S320 내지 S340)의 실행을 위해, 프로세서(110) 및 프로세서(110)의 구성요소들은 직접 제어 명령에 따른 연산을 처리하거나 또는 컴퓨터 시스템(100)을 제어할 수 있다.At this time, each of the candidate selector 210, the representative extractor 220, and the scene searcher 230 included in the processor 110 and the processor 110 may include a corresponding part of the program code loaded into the memory 120 And may be different functional representations of the processor 110 for executing subsequent steps S320 through S340. For the execution of steps S320 through S340, the processor 110 and the components of the processor 110 may process an operation according to a direct control command or control the computer system 100. [

단계(S320)에서 후보 선택부(210)는 주요 장면 추출을 위한 동영상을 대상으로 해당 동영상의 프레임 중에서 후보 프레임을 선택하도록 컴퓨터 시스템(100)을 제어할 수 있다. 일례로, 후보 선택부(210)는 동영상의 키 프레임(key frame)을 후보 프레임으로 선택하거나, 혹은 일정 시간(예컨대, 1초) 간격으로 추출된 프레임을 후보 프레임으로 선택할 수 있다. 키 프레임을 후보 프레임으로 사용하는 경우 프레임 간 시간 간격이 일정 시간(예컨대, 1초) 이하인 경우 후보 프레임 선정 대상에서 제외시킬 수 있다. 후보 선택부(210)는 동영상에서 키 프레임 또는 일정 간격의 프레임을 추출하고 추출된 프레임 간에 밝기 변화와 컬러 히스토그램(color histogram) 변화 등의 장면 변화 값을 계산한 후 장면 변화 값을 기준으로 상기 추출된 프레임 중에서 적어도 일부의 프레임을 후보 프레임으로 선택할 수 있다. 예를 들어, 후보 선택부(210)는 프레임 간 밝기 변화량과 컬러 히스토그램 변화량을 일정 비율(예컨대, 1:2)로 조합하여 조합된 결과 값이 일정 크기 이상의 값을 가지는 경우 해당 프레임을 선택하여 후보 프레임으로 사용할 수 있다.In step S320, the candidate selector 210 may control the computer system 100 to select a candidate frame among the frames of the moving picture, which is a moving picture for extracting a main scene. For example, the candidate selector 210 may select a key frame of a moving picture as a candidate frame or a frame extracted at a predetermined time interval (e.g., 1 second) as a candidate frame. When a key frame is used as a candidate frame, if the time interval between frames is equal to or less than a predetermined time (for example, 1 second), the candidate frame can be excluded. The candidate selecting unit 210 extracts a key frame or a frame at a predetermined interval from the moving image, calculates a scene change value such as a brightness change and a color histogram change between the extracted frames, At least some of the frames may be selected as candidate frames. For example, the candidate selecting unit 210 may combine the inter-frame brightness variation and the color histogram variation into a certain ratio (e.g., 1: 2) It can be used as a frame.

단계(S330)에서 대표 추출부(220)는 동영상에서 선택된 후보 프레임 각각에 대하여 해당 영상의 특징 정보를 나타내는 후보 값을 계산한 후 후보 값을 기준으로 후보 프레임 중 일부 프레임을 대표 이미지로 추출할 수 있다. 일례로, 동영상에서 선택된 후보 프레임들은 각각 독립적인 후보 값을 가지며, 후보 값을 기준으로 정렬하여 후보 값이 큰 순서대로 일정 개수의 프레임이 대표 이미지로 추출될 수 있다. 다른 예로, 후보 프레임 중에서 후보 값이 일정 크기 이상의 값을 가지는 경우 해당 프레임을 대표 이미지로 추출하는 것 또한 가능하다. 대표 추출부(220)는 동영상의 콘텍스트를 기반으로 영상의 특징 정보를 포함하고 있는 후보 값을 계산하기 위한 일례로, 후보 프레임 간의 시간 간격 기반 값, 이미지 품질 측정 값, 자막 특징 값 중 적어도 하나의 값의 조합으로 후보 값을 계산할 수 있다. 여기서, 시간 간격 기반 값은 후보 프레임 간의 시간 간격을 나타내는 값이고, 이미지 품질 측정 값은 후보 프레임의 영상 품질을 측정한 값을 의미하고, 자막 특징 값은 후보 프레임의 영상에 포함된 자막의 특징 정보를 나타내는 값을 의미할 수 있다. 후보 값을 계산하는데 이용될 수 있는 인자, 즉 후보 프레임 간의 시간 간격 기반 값, 이미지 품질 측정 값, 자막 특징 값에 대해서는 이하에서 다시 설명하기로 한다.In step S330, the representative extraction unit 220 calculates a candidate value indicating the feature information of the selected image for each of the candidate frames selected in the moving image, and extracts some of the candidate frames as the representative image based on the candidate value have. For example, the candidate frames selected in the moving picture have independent candidate values, and a certain number of frames may be extracted as the representative images in order of decreasing candidate values by sorting them based on the candidate values. As another example, if the candidate value in the candidate frame has a value equal to or larger than a predetermined value, it is also possible to extract the frame as a representative image. The representative extracting unit 220 is an example for calculating a candidate value including feature information of an image based on the context of a moving image. The representative extracting unit 220 may include at least one of a temporal interval value between candidate frames, an image quality measurement value, A candidate value can be calculated by a combination of values. Here, the time interval-based value is a value indicating a time interval between candidate frames, the image quality measurement value is a value obtained by measuring the image quality of the candidate frame, and the caption feature value is a feature value of the caption included in the image of the candidate frame Quot ;. < / RTI > The factors that can be used to calculate the candidate value, i.e., the time interval based value between candidate frames, the image quality measurement value, and the subtitle feature value will be described below.

단계(S340)에서 장면 탐색부(230)는 동영상에서 추출된 대표 이미지를 이용하여 해당 동영상의 장면 탐색 기능을 제공할 수 있다. 장면 탐색부(230)는 동영상의 콘텍스트를 기반으로 추출된 대표 이미지를 해당 동영상의 주요 장면으로 선출하여 장면 탐색 기능으로 제공할 수 있다. 장면 탐색 기능은 동영상의 장면 탐색과 함께 재생 구간을 이동하는 기능을 포함하는 것으로, 콘텍스트 기반의 대표 이미지를 동영상의 주요 장면으로 구성하고 구간 이동을 위한 썸네일(thumbnail)로 활용할 수 있다.In operation S340, the scene search unit 230 may provide a scene search function of the corresponding moving image using the representative image extracted from the moving image. The scene search unit 230 may select a representative image extracted based on the context of the moving image as a main scene of the moving image, and provide the selected scene as a scene search function. The scene search function includes a function of moving a playback section along with a scene search of a moving image. The representative image based on the context can be configured as a main scene of a moving image and utilized as a thumbnail for moving the section.

이하에서는 후보 프레임의 후보 값을 계산하는데 이용되는 각 인자에 대해 구체적으로 설명하기로 한다.Hereinafter, each factor used for calculating the candidate value of the candidate frame will be described in detail.

(1) 후보 프레임 간의 시간 간격 기반 값(1) Time interval based value between candidate frames

후보 프레임 간의 시간 간격 기반 값은 인접한 후보 프레임, 예컨대 이전 후보 프레임과의 시간 간격을 나타내는 값으로, 후보 프레임 간의 시간 간격이 길수록 상대적으로 큰 값이 부여되고 반대로 후보 프레임 간의 시간 간격이 짧을수록 상대적으로 작은 값이 부여된다. 후보 프레임 간 시간 간격이 짧은 경우에는 동일한 장면 내의 프레임이나 영상 정보가 유사한 프레임이 대표 이미지로 선택될 가능성이 크므로 이를 배제하기 위해 대표 추출부(220)는 후보 프레임 간 시간 간격이 길수록 큰 값이 부여되는 시간 간격 기반 값을 이용하여 후보 값을 계산할 수 있다.A value based on a time interval between candidate frames is a value indicating a time interval between adjacent candidate frames, for example, a previous candidate frame. The longer the time interval between the candidate frames is, the larger the relative value is. A small value is given. When the time interval between candidate frames is short, it is highly likely that a frame within the same scene or a frame similar to the image information is selected as the representative image. Therefore, in order to exclude the representative image, the representative extracting unit 220, The candidate value can be calculated using the time interval-based value to be given.

(2) 이미지 품질 측정 값(2) Image quality measurement value

대표 추출부(220)는 노출 값(Exposure), 선명도(Sharpness), 생생도(Vividity), 저심도(Low DOF(depth of field)), 대비 값(Contrast), 프로미넌스(Prominence) 중 적어도 하나의 값의 조합으로 이미지 품질 측정 값을 계산할 수 있다.The representative extractor 220 may extract at least one of an exposure value Exposure, a sharpness, a vividness, a low depth of field (DOF), a contrast value, and a prominence The image quality measure can be calculated by a combination of values.

이미지 품질 측정 항목 중 노출 값은 영상 내 주요 영역의 밝기 평균 값을 나타내는 것으로, 지나치게 어둡거나 밝은 경우 감점이 된다. 선명도는 영상 내 주요 영역이 선명하게 보이는 정도를 나타내는 것이고, 생생도는 선명도와 유사하나 원색 값과 비율이 높을수록 고득점이 부여된다. 저심도는 영상의 심도(depth of field)를 나타내는 것으로, 심도가 얕은 영상이나 단순 배경 영상에 고득점이 부여된다. 대비 값은 영상의 대비를 나타내는 것으로, 피사체 내의 대비가 강할수록 고득점이 부여된다. 마지막으로, 프로미넌스는 영상의 배경과 피사체의 대비를 나타내는 것으로, 피사체와 배경의 명암이나 색상 등이 뚜렷이 구분될수록 고득점이 부여된다.Among the image quality measurement items, the exposure value represents the average brightness value of the main area in the image, and it becomes a deduction when it is too dark or bright. Sharpness indicates the extent to which the main areas of the image are clearly visible, and vividness is similar to sharpness, but the higher the primary color value and ratio, the higher the score. The low depth indicates the depth of field of the image, and a high score is given to a shallow depth image or a simple background image. The contrast value indicates the contrast of the image, and the higher the contrast in the subject, the higher the score. Finally, the prominence shows the contrast between the background of the image and the subject. The more clearly the contrast and color of the subject and the background are distinguished, the higher the score is.

(2-1) 이미지 주요 영역 설정(2-1) Setting the image main area

먼저, 이미지 품질 측정 값을 계산하기 위해 후보 프레임의 영상 내에서 주요 영역을 설정할 필요가 있다. 대표 추출부(220)는 영상에서 눈에 띄는 영역을 인간과 유사한 시각적 방식으로 찾아주는 시각돌출 지도(saliency map) 모델을 이용함으로써 영상에서 주요 영역(salient area)을 찾을 수 있다. 다시 말해, 주요 영역은 영상 내에서 시각돌출(saliency)이 강한 부분을 의미하며, 주요 피사체나 자막 등이 포함될 가능성이 높다. 일례로, 대표 추출부(220)는 영상의 시각돌출 지도의 분포를 가우시안(Gaussian) 모델링 함으로써 도 4에 도시한 바와 같이 영상(400)에서 주요 피사체나 자막 등이 포함된 주요 영역(401)을 설정할 수 있다. 주요 영역(401)이 작으면 피사체에 대한 집중도가 높은 이미지일 가능성이 높다. 이미지 품질 측정 항목은 경우에 따라 영상(400)의 전체 영역보다는 일부 영역인 주요 영역(401)에서 구하는 것이 의미 있는 값을 획득할 수 있다.First, it is necessary to set the main area in the image of the candidate frame in order to calculate the image quality measurement value. The representative extraction unit 220 can search a salient area in the image by using a saliency map model that finds a visible region in the image in a similar visual manner to a human. In other words, the main area means a part where the visual saliency is strong in the image, and it is likely to include a main subject or a subtitle. For example, the representative extraction unit 220 performs Gaussian modeling of the distribution of the visual protrusion map of the image, thereby extracting a main area 401 including a main subject, a subtitle, and the like from the image 400 as shown in FIG. 4 Can be set. If the main area 401 is small, there is a high possibility that the image is highly focused on the subject. The image quality measurement item may acquire a meaningful value obtained in the main area 401, which is a partial area rather than the entire area of the image 400 as the case may be.

(2-2) 노출 값(Exposure)(2-2) Exposure value (Exposure)

이미지 품질 측정 항목 중 하나인 노출 값은 영상의 노출이 적당한지 여부를 판별할 수 있는 항목이다. 일례로, 대표 추출부(220)는 도 5에 도시한 바와 같이 영상(500) 내 주요 영역(501)에 대해 밝기 값의 산술 평균(Ma)과 기하 평균(Mg), 그리고 콘트라스트와 경계선 강도(edge strength)를 계산할 수 있다. 이때, 밝기 값의 산술 평균(Ma)은 기본 값(base score)으로 설정할 수 있다. 노출 과부족의 경우 픽셀 간 밝기 차이가 크지 않기 때문에 주요 영역의 노출 값을 계산함에 있어 밝기 값의 기하 평균(Mg)을 고려하는 것 또한 가능하다. 아울러, 노출 과부족의 경우 색 대비, 명암 대비 등이 낮기 때문에 주요 영역의 노출 값을 계산할 때 콘트라스트를 고려하고, 마찬가지로 노출 과부족 시 경계선이 약한 특성이 나타나기 때문에 주요 영역의 노출 값을 계산할 때 경계선 강도를 고려할 필요가 있다.The exposure value, which is one of the image quality measurement items, is an item that can judge whether or not the exposure of the image is appropriate. 5, the representative extraction unit 220 extracts arithmetic mean (Ma) and geometric mean (Mg) of brightness values and contrast and border line intensity (Mg) for the main region 501 in the image 500 edge strength. At this time, the arithmetic mean (Ma) of brightness values can be set as a base score. It is also possible to consider the geometric mean (Mg) of the brightness value in calculating the exposure value of the main area, since the brightness difference between pixels is not large in case of exposure overexposure. In addition, since exposure and overexposure are low in color contrast, contrast, etc., the contrast is considered when calculating the exposure value of the main area. Similarly, when the exposure value of the main area is calculated, Need to be considered.

일례로, 노출 값은 수학식 1을 통해 계산될 수 있다.In one example, the exposure value can be calculated through Equation (1).

[수학식 1][Equation 1]

노출 값=base×scale_factor_1×scale_factor_2×scale_factor_3Exposure value = base × scale_factor_1 × scale_factor_2 × scale_factor_3

여기서, base는 밝기 값의 산술 평균(Ma)의 이차 함수를 의미하고, scale_factor_1은 밝기 값의 기하 평균(Mg)이 산술 평균(Ma)과 대비하여 낮을수록 큰 값이 부여되는 가중치를 의미하고, scale_factor_2는 콘트라스트가 높을수록 큰 값이 부여되는 가중치를 의미하고, scale_factor_3은 경계선 강도가 높을수록 큰 값이 부여되는 가중치를 의미한다.Here, base denotes a quadratic function of the arithmetic mean (Ma) of the brightness values, and scale_factor_1 denotes a weight that is given a larger value as the geometric mean (Mg) of the brightness value is lower than the arithmetic mean (Ma) scale_factor_2 means a weight given a higher value as the contrast is higher, and scale_factor_3 means a weight given a larger value as the border strength becomes higher.

(2-3) 선명도(Sharpness)(2-3) Sharpness

이미지 품질 측정 항목 중 다른 하나인 선명도는 블러링(blurring) 없이 선명한 영상인지 여부를 판별할 수 있는 항목이다. 일례로, 대표 추출부(220)는 영상 전체 혹은 주요 영역에 대해 픽셀 각각의 선명도를 계산한 후 각 픽셀 선명도의 절대값들 중 상위 일정 범주(예컨대, 1%)의 평균을 계산함으로써 해당 영상의 선명도를 구할 수 있다. 픽셀 선명도는 pixel sharpness=|I-J|와 같이 정의될 수 있으며, 이때 I는 원본 이미지, J는 원본 이미지 I를 블러링한 이미지를 의미한다. 영상 전체 혹은 주요 영역의 선명도는 픽셀 선명도 중 상위 일부의 평균 값을 사용할 수 있다.One of the image quality measurement items, sharpness, is an item that can discriminate whether or not it is a clear image without blurring. For example, the representative extracting unit 220 may calculate the sharpness of each pixel of the entire image or the main region, and then calculate an average of an upper certain category (for example, 1%) of the absolute values of the pixel sharpness, The sharpness can be obtained. The pixel sharpness can be defined as pixel sharpness = | I-J |, where I is the original image and J is the image blurred the original image I. The sharpness of the whole image or the main area can be an average value of the upper part of the pixel sharpness.

(2-4) 생생도(Vividity)(2-4) Vividness

이미지 품질 측정 항목 중 또 다른 하나인 생생도는 원색의 정도와 비율이 높은 정도를 나타내는 항목이다. 생생도는 도 6에 도시한 바와 같이 HSV(hue saturation value)(색상 성분 H, 채도 성분 S, 명도 성분 V) 컬러 공간(610)으로 변환하여 채도 성분 S와 명도 성분 V의 곱으로 표현되며, 이때 픽셀 상위 일부(예컨대, 50%)의 평균을 사용할 수 있다. HSV 컬러 공간(610)에서 채도 성분 S를 분리해 낼 수 있으며, 일례로 대표 추출부(220)는 영상 전체 혹은 주요 영역에 대해 픽셀 각각의 채도 성분 S을 계산한 후 이 중 상위 일부의 평균 값을 계산하여 이를 해당 영상의 생생도로 사용할 수 있다.One of the items of image quality measurement is brightness, which indicates the degree and degree of primary color. As shown in FIG. 6, the luminosity is converted into a hue saturation value (HSV) (color component H, saturation component S, brightness component V) color space 610 and expressed as a product of a saturation component S and a brightness component V, At this time, an average of a part of the pixel (for example, 50%) can be used. The representative extraction unit 220 may calculate the saturation component S of each of the pixels for the entire image or the main area and then calculate the average value of the upper part of the saturation component S And can use it as the life of the corresponding image.

(2-5) 저심도(Low DOF(depth of field))(2-5) Low Depth (Low DOF (depth of field))

이미지 품질 측정 항목 중 또 다른 하나인 저심도는 심도(DOF)가 얕은 영상 등 배경이 단순한 정도를 나타내는 항목이다. 일례로, 대표 추출부(220)는 영상에서 주요 영역을 설정한 후 해당 주요 영역의 상대적 크기를 수치화 하여 이를 저심도로 사용할 수 있다. 다시 말해, 대표 추출부(220)는 도 7에 도시한 바와 같이 전체 프레임(700)의 크기에 대한 주요 영역(701)의 상대적 크기를 수치화 함으로써 저심도를 계산할 수 있다. 저심도는 수학식 2와 같이 정의될 수 있다.Low Depth, another of the image quality metrics, is a measure of the degree of simplicity of the background, such as shallow depth of field (DOF) images. For example, the representative extraction unit 220 can set a major region in an image and then quantify the relative size of the major region, and use the reduced region as the low depth. In other words, as shown in FIG. 7, the representative extracting unit 220 can calculate the low depth by quantifying the relative size of the main region 701 with respect to the size of the entire frame 700. The low depth can be defined as Equation (2).

[수학식 2]&Quot; (2) "

저심도=1-(주요 영역의 크기)/(영상의 전체 크기)}Low depth = 1- (size of main area) / (total size of image)}

(2-6) 대비 값(Contrast)(2-6) Contrast value (Contrast)

이미지 품질 측정 항목 중 또 다른 하나인 대비 값은 영상 전체의 색 대비와 명암 대비의 정도를 나타내는 항목이다. 일례로, 대표 추출부(220)는 도 8에 도시한 바와 같이 CIE LAB 컬러 공간(810)에서 공분산 행렬(covariance matrix)을 구하고 해당 행렬에서 고유 값(eigen value)인 모든 대각 성분의 합(trace)으로 크기를 계산할 수 있으며, 이를 대비 값으로 사용할 수 있다. 색상 간의 유클리디안 거리(Euclidean Distance)가 RGB 공간보다 실제 색상 차이를 잘 반영할 수 있으며, 대비 값이 강할수록 픽셀 클라우드(pixel cloud)가 차지하는 볼륨이 큰 경향이 있다. 다시 말해, 대비 값을 나타내는 볼륨은 픽셀 클라우드에 대해 가우시안 모델링을 수행한 후, 즉 공분산 행렬을 구한 후 해당 행렬에서 고유 값의 합으로 산출될 수 있다.The contrast value, which is another item of the image quality measurement item, indicates the degree of the color contrast and the contrast of the entire image. 8, the representative extractor 220 obtains a covariance matrix in the CIE LAB color space 810 and calculates a covariance matrix of the sum of all diagonal components (eigenvalues) ), Which can be used as a contrast value. Euclidean distance between colors can reflect the actual color difference better than RGB space. The larger the contrast value, the larger the volume occupied by the pixel cloud. In other words, the volume representing the contrast value can be calculated as the sum of the eigenvalues in the matrix after performing the Gaussian modeling on the pixel cloud, that is, by obtaining the covariance matrix.

(2-7) 프로미넌스(Prominence)(2-7) Prominence

이미지 품질 측정 항목 중 또 다른 하나인 프로미넌스는 색상, 명암 등 피사체와 배경이 뚜렷이 구분되는 정도를 나타내는 항목이다. 일례로, 대표 추출부(220)는 전체 프레임 내에서 주요 영역과 배경이 가지는 영상의 특성 차이를 이용하여 프로미넌스를 계산할 수 있다. 대표 추출부(220)는 도 9에 도시한 바와 같이 영상(900)을 피사체가 포함되는 주요 영역(901)과 배경 영역(903)으로 구분하고 주요 영역(901)과 배경 영역(903) 각각에 대해 CIE LAB 컬러 공간 상의 평균 픽셀 값 차이, 그리고 콘트라스트와 경계선 강도를 계산할 수 있다. 이때, 평균 픽셀 값 차이는 기본 값으로 설정할 수 있다. 일례로, 프로미넌스는 수학식 3을 통해 계산될 수 있다.Prominence, another of the image quality measurement items, is an item indicating the degree of distinctness of the subject and background such as color and contrast. For example, the representative extracting unit 220 may calculate the prominence using the difference in characteristics of the image of the main region and the background in the entire frame. 9, the representative extraction unit 220 divides the image 900 into a main area 901 and a background area 903 including an object, and extracts the main image 901 and the background area 903 in the main area 901 and the background area 903, respectively, The average pixel value difference over the CIE LAB color space, and the contrast and border strength. At this time, the average pixel value difference can be set as a default value. For example, the prominence can be calculated through Equation (3).

[수학식 3]&Quot; (3) "

프로미넌스=base×max(scale_factor_1, scale_factor_2)Prominence = base × max (scale_factor_1, scale_factor_2)

여기서, base는 주요 영역(901)과 배경 영역(903)의 평균 픽셀 값 차이를 의미하고, scale_factor_1은 주요 영역(901)과 배경 영역(903) 간의 콘트라스트의 비가 높을수록 큰 값이 부여되는 가중치를 의미하고, scale_factor_2는 주요 영역(901)과 배경 영역(903) 간의 경계선 강도의 비가 높을수록 큰 값이 부여되는 가중치를 의미한다.Here, base denotes the average pixel value difference between the main area 901 and the background area 903, and scale_factor_1 denotes a weight that gives a larger value as the contrast ratio between the main area 901 and the background area 903 increases Scale_factor_2 means a weight value that is given a larger value as the ratio of the boundary line strength between the main area 901 and the background area 903 is higher.

따라서, 대표 추출부(220)는 노출 값, 선명도, 생생도, 저심도, 대비 값, 프로미넌스 중 적어도 하나의 값의 조합으로 후보 이미지의 이미지 품질 측정 값을 계산할 수 있다.Therefore, the representative extractor 220 may calculate the image quality measurement value of the candidate image by a combination of at least one of exposure value, sharpness, vividness, low depth, contrast value, and prominence.

(3) 자막 특징 값(3) Subtitle feature value

영상 내에 존재하는 자막은 정보를 전달하고 프레임 영상 내에서 시선을 끄는 요소로 작용하므로 후보 프레임 중 대표 이미지를 추출하는데 자막 특징 값을 활용할 수 있다. 일례로, 대표 추출부(220)는 프레임 영상에서 자막 영역을 도출하여 해당 영역 내 특징 값을 산출하는 것으로, 자막 영역의 특징을 나타내는 정보인 자막이 가지는 크기 값, 위치 값, 모션 값, 색상 정보, 주파수 정보 중 적어도 하나의 값의 조합으로 자막 특징 값을 계산할 수 있다.Subtitles existing in the image convey information and act as an eye-lining element in the frame image, so the subtitle feature value can be utilized to extract the representative image among the candidate frames. For example, the representative extraction unit 220 derives a caption area in a frame image to calculate a feature value in the corresponding area. The representative extraction unit 220 extracts a size value, a position value, a motion value, and a color information of a caption, , And frequency information, as shown in FIG.

대표 추출부(220)는 프레임 영상의 전체 영역에서 문자열이 존재하는 자막 영역을 검출할 수 있다. 영상에서 자막 영역을 검출하는 것은 영상의 각 화소(pixel)를 자막 영역(text region) 또는 비자막 영역(non-text region)으로 분류하는 작업이다. 이를 위해, 대표 추출부(220)는 자막 영역의 특성을 나타낼 수 있는 다양한 특징(feature)을 사용하여 분류기를 학습하는 기계학습(machine learning) 방법을 사용하거나, 자막 영역을 비자막 영역과 구분할 수 있는 규칙들을 정의해 놓은 전문가 시스템을 사용할 수 있다. 일례로, 자막 영역은 비자막 영역에 비해 화소 강도(intensity) 변화가 급격하며 외곽선의 분포가 조밀하고 규칙적이므로 분류하려는 화소 주변의 지역적 영상에서 나타나는 화소 강도들의 평균 차이 값(mean difference feature), 표준편자(standard deviation), 혹은 방향 별 외곽선의 히스토그램(Histogram of oriented gradients) 등의 특징을 이용하여 자막 영역과 비자막 영역을 분류할 수 있다. 화소 별 자막/비자막 분류 이후에는 자막으로 분류된 화소들 중 인접한 화소들을 하나로 묶어 문자열에 해당되는 최종 자막 영역을 얻을 수 있다. 이후, 대표 추출부(220)는 프레임 영상에서 자막 영역이 최종으로 검출되면 검출된 자막 영역에 대하여 해당 영역의 크기 값, 위치 값, 모션 값, 색상 정보, 주파수 정보 등을 산출하여 이들 값의 조합으로 자막 특징 값을 구할 수 있다.The representative extraction unit 220 can detect a caption area in which a character string exists in the entire area of the frame image. Detecting a caption region in an image is a task of classifying each pixel of an image into a text region or a non-text region. To this end, the representative extraction unit 220 may use a machine learning method for learning a classifier using various features that can express characteristics of the caption region, or may classify the caption region as a non-caption region You can use an expert system that defines the rules. For example, the subtitle region has a sharp change in the intensity of pixels compared to the non-subtitle region, and since the outline is highly dense and regular, the mean difference feature of pixel intensities in a local image around the pixel to be classified, It is possible to classify subtitles and non-subtitles by using features such as standard deviation or histogram of oriented gradients. After the pixel-by-pixel caption / non-caption classification, it is possible to obtain the final caption region corresponding to the character string by grouping adjacent pixels among the pixels classified as the caption. Then, the representative extractor 220 calculates a size value, a position value, a motion value, a color information, a frequency information, and the like of the corresponding region with respect to the detected caption region when the caption region is finally detected in the frame image, The subtitle characteristic value can be obtained.

도 10은 프레임 영상에서 자막 영역을 도출하여 자막 특징 값을 산출하는 일련의 과정을 도시한 순서도이다. 도 10의 자막 특징 값 산출 과정은 도 3을 통해 설명한 주요 장면 제공 방법의 단계(S330)에 포함될 수 있다.FIG. 10 is a flowchart illustrating a series of processes for calculating a caption feature value by deriving a caption region from a frame image. The caption feature value calculation process of FIG. 10 may be included in step S330 of the main scene providing method illustrated in FIG.

단계(S1001)에서 대표 추출부(220)는 후보 프레임을 입력 받도록 컴퓨터 시스템(100)을 제어할 수 있다.In step S1001, the representative extraction unit 220 may control the computer system 100 to receive a candidate frame.

단계(S1002)에서 대표 추출부(220)는 후보 프레임에 해당되는 컬러 영상을 밝기 영상(예컨대, 그레이 영상)으로 변환할 수 있다. 일례로, 대표 추출부(220)는 컬러 영상에 대하여 광원 보정 알고리즘을 적용하여 컬러 영상의 광원을 보정한 후, 보정된 컬러 영상을 그레이 영상으로 변환할 수 있다.In step S1002, the representative extraction unit 220 may convert the color image corresponding to the candidate frame into a brightness image (e.g., a gray image). For example, the representative extraction unit 220 may correct the light source of the color image by applying a light source correction algorithm to the color image, and then convert the corrected color image into a gray image.

단계(S1003)에서 대표 추출부(220)는 밝기 영상에 대해 색상 변화 기반의 이진화를 수행할 수 있다. 즉, 대표 추출부(220)는 밝기 영상에서 색상 변화 픽셀을 검출함으로써 색상 변화 기반 이진 영상을 생성할 수 있다.In step S1003, the representative extraction unit 220 may perform binarization based on the hue change on the brightness image. That is, the representative extraction unit 220 may generate a color change-based binary image by detecting a color change pixel in a brightness image.

단계(S1004)에서 대표 추출부(220)는 밝기 영상에 대해 경계선 검출 기반의 이진화를 수행할 수 있다. 예를 들어, 대표 추출부(220)는 캐니(Canny) 알고리즘 등을 사용하여 밝기 영상 내에서 경계선(edge)을 검출함으로써 경계선 검출 기반 이진 영상을 생성할 수 있다.In step S1004, the representative extraction unit 220 may perform boundary detection based binarization on the brightness image. For example, the representative extraction unit 220 may generate a boundary detection based binary image by detecting a boundary in a brightness image using a Canny algorithm or the like.

단계(S1005)에서 대표 추출부(220)는 색상 변화 기반 이진 영상과 경계선 검출 기반 이진 영상을 합성(combination)함으로써 두 이진 영상에 대한 합성 영상을 획득할 수 있다.In step S1005, the representative extraction unit 220 may obtain a composite image of the two binary images by combining the color change-based binary image and the boundary detection-based binary image.

단계(S1006)에서 대표 추출부(220)는 합성 영상에서 자막을 검출하기 위한 관심 영역(ROI: Region of Interest)인 후보 영역을 검출할 수 있다.In step S1006, the representative extraction unit 220 may detect a candidate region that is a region of interest (ROI) for detecting a caption in the composite image.

단계(S1008)에서 대표 추출부(220)는 ML-LBP(Multi Block Local Binary Pattern)을 기반으로 후보 영역에서 문자열이 실제 존재하는 자막 영역을 검출할 수 있다. 이때, 대표 추출부(220)는 입력 프레임에 대해 특징 추출 기법(예컨대, Wavelet transform 등)을 적용하여 ML-LBP 특징점을 구한 후 이를 자막 영역 검출에 활용할 수 있다(S1007).In step S1008, the representative extracting unit 220 may detect a caption area where a character string actually exists in the candidate area based on ML-LBP (Multi-Block Local Binary Pattern). At this time, the representative extracting unit 220 may extract a ML-LBP feature point by applying a feature extraction technique (e.g., wavelet transform) to the input frame, and utilize the ML-LBP feature point for detecting the caption area (S1007).

영상 처리에 있어서 사용되는 웨이블릿 변환(Wavelet transform)은 단순히 부대역 분해를 하는 것으로 생각할 수 있다. 다시 말해, 저주파 대역(lowpass) 필터와 고주파 대역(highpass) 필터를 사용해서 이미지의 주파수 대역을 나누어 대역 별로 부호화하는 방법을 말한다. 이렇게 필터를 사용해서 영상의 대역을 나누는 과정을 웨이블릿 분해라고 한다. 영상의 경우 2차원 신호이므로 가로와 세로에 각각 저주파 대역 필터와 고주파 대역 필터를 사용해서 분해하게 된다. 그 결과, 서로 다른 4개의 대역이 생기게 되는데, 도 11과 같이 왼쪽 위의 영상은 가로와 세로 방향 모두 저주파 대역(LL band)이 되며, 왼쪽 아래의 영상은 가로 방향으로 저주파, 세로 방향으로 고주파 대역(LH band)이 되고, 오른쪽 위의 영상은 가로 방향으로 고주파, 세로 방향으로 저주파 대역(HL band)이 되며, 오른쪽 아래의 영상은 가로와 세로 방향 모두 고주파 대역(HH band)이 된다.The wavelet transform used in image processing can be considered as simply performing subband decomposition. In other words, a low-pass filter and a high-pass filter are used to divide the frequency band of an image and encode each band. The process of dividing the image band using the filter is called wavelet decomposition. Since the image is a two-dimensional signal, it is decomposed using a low-pass filter and a high-pass filter, both horizontally and vertically. As a result, four different bands are generated. As shown in Fig. 11, the upper left image has a low frequency band (LL band) in both the horizontal and vertical directions, the lower left image has a low frequency in the horizontal direction, (LH band), the upper right image has high frequency in the horizontal direction, the low frequency band (HL band) in the vertical direction, and the lower right image has the high frequency band (HH band) in both the horizontal and vertical directions.

영상의 특성 상 가로 세로 저주파 대역(LL band)에 거의 모든 정보가 담겨 있으므로 해당 저주파 영역(LL band)을 또 하나의 새로운 영상으로 생각하여 해당 대역(LL band)을 다시 한번 웨이블릿 분해를 적용할 수 있다. 이와 같이, 웨이블릿 분해한 영상의 저주파 대역(LL band)만을 반복적으로 웨이블릿 분해하는 경우 도 12에 도시한 바와 같이 저주파 대역에서 서로 다른 4개의 대역이 생기게 된다.Since the LL band contains almost all the information in the LL band, the LL band is regarded as another new image and the wavelet decomposition can be applied to the LL band again. have. As described above, when wavelet-decomposing only the low-frequency band (LL band) of the wavelet-decomposed image repeatedly, four different bands are generated in the low-frequency band as shown in Fig.

대표 추출부(220)는 후보 영역 각각에 대해 LBP 특징점 가중치(weight)를 산출한 후 해당 가중치가 일정 크기 이상의 값을 가지는 영역을 자막 영역으로 판단할 수 있다. 예컨대, 도 13에 도시한 바와 같이 하나의 픽셀을 기준으로 주변 8개의 픽셀과 비교하여 픽셀 값이 기준 픽셀보다 큰 픽셀은 1, 기준 픽셀보다 작은 픽셀은 0을 할당하는 이진 형태로 표현하면 총 256가지 패턴을 가질 수 있다. 이에, 대표 추출부(220)는 웨이블릿 분해한 영상 중 LH 대역과 HL 대역의 전체 픽셀에 대해 LBP을 사용함으로써 LBP 특징점 가중치를 계산할 수 있으며, 일례로 LBP 특징점 가중치는 수학식 4와 같이 정의될 수 있다.The representative extractor 220 may calculate LBP feature point weights for each of the candidate regions, and then determine an area having a weight value equal to or greater than a predetermined size as a caption area. For example, as shown in FIG. 13, when a pixel having a pixel value larger than that of the reference pixel is compared with eight surrounding pixels based on one pixel, and a pixel smaller than the reference pixel is assigned a binary value of 0, You can have a branch pattern. The representative extractor 220 may calculate LBP feature point weights by using LBP for all the pixels in the LH band and the HL band among the wavelet-decomposed images. For example, the LBP feature point weight may be defined as Equation 4 have.

[수학식 4]&Quot; (4) "

Weight=λ×{nLBP/256}Weight = lambda x {nLBP / 256}

여기서, 256은 픽셀 간에 표현 가능한 패턴의 경우의 수, nLBP는 후보 영역에서 서로 다른 값을 가지는 LBP의 개수(복잡한 영역에 대해 높은 값을 가짐), λ는 유효 픽셀 개수를 나타내는 정제 계수(refinement coefficient)를 의미할 수 있다. 일례로, 정제 계수는 수학식 5와 같이 정의될 수 있다.Where nLBP is the number of LBPs having different values in the candidate region (having a high value for a complex region), and? Is a refinement coefficient indicating the number of effective pixels ). &Lt; / RTI > For example, the refinement coefficient can be defined as: " (5) "

[수학식 5]&Quot; (5) "

λ={PxCnt(p_i,HL)+PxCnt(p_i,LH)}/{2N×κ}? = {PxCnt ( _pi , HL) + PxCnt ( _pi , LH)} / {2Nxk}

여기서, N은 픽셀의 개수, i는 픽셀의 위치[0, N], PxCnt()는 임계값(threshold) 이상인 픽셀의 개수, κ는 영상 복잡도를 나타내는 상수를 의미할 수 있다.Where N is the number of pixels, i is the position of the pixel [0, N], PxCnt () is the number of pixels above the threshold, and κ is a constant representing the image complexity.

따라서, 대표 추출부(220)는 도 14에 도시한 바와 같이 후보 프레임(1400) 내에서 ML-LBP을 활용하여 문자열이 실제 존재하는 자막 영역(1405)을 검출할 수 있다.Therefore, the representative extracting unit 220 can detect the caption area 1405 in which the character string actually exists by utilizing the ML-LBP in the candidate frame 1400 as shown in Fig.

다시 도 10에서, 단계(S1009)에서 대표 추출부(220)는 문자열이 존재하는 자막 영역 내에 해당 영역이 가지는 특징 값, 즉 크기 값, 위치 값, 모션 값, 색상 정보, 주파수 정보 중 적어도 하나를 계산한 후 이들의 조합으로 자막 특징 값을 계산할 수 있다. 일례로, 대표 추출부(220)는 후보 프레임에서 검출된 자막 영역의 LBP 특징점 가중치를 해당 영상의 자막 특징 값으로 사용할 수 있다.10, in step S1009, the representative extraction unit 220 extracts at least one of the feature values of the corresponding region, that is, the size value, the position value, the motion value, the color information, and the frequency information, The subtitle feature value can be calculated by a combination of these. For example, the representative extractor 220 may use the LBP feature point weight of the caption area detected in the candidate frame as the caption feature value of the corresponding image.

대표 추출부(220)는 후보 프레임 각각에 대해 후보 프레임 간 시간 간격 기반 값을 이용하여 후보 값을 계산할 수 있으며, 더 나아가 시간 간격 기반 값과 함께, 이미지 품질 측정 값과 자막 특징 값 중 적어도 하나를 더 이용하여 후보 값을 계산할 수 있다. 시간 간격 기반 값, 이미지 품질 측정 값, 자막 특징 값 중 둘 이상의 인자를 조합하여 후보 값을 계산하는 경우 각 인자의 조합 비율을 일정하게 적용하거나, 혹은 동영상의 특성이나 인자 별 중요도 등에 따라 각 인자의 조합 비율(가중치)을 서로 다르게 적용하는 것 또한 가능하다.The representative extractor 220 may calculate a candidate value using a candidate inter-frame time interval based value for each of the candidate frames, and may further calculate at least one of the image quality measurement value and the subtitle characteristic value The candidate value can be calculated. When the candidate value is calculated by combining two or more factors among the time interval-based value, the image quality measurement value, and the caption feature value, the combination ratio of each factor is applied constantly, or the factor of each factor It is also possible to apply the combination ratios (weights) differently.

따라서, 본 발명에서는 동영상에서 후보 프레임을 선택하고 선택된 후보 프레임 각각에 대해 해당 영상의 특징 정보를 포함하고 있는 후보 값을 계산한 후 후보 값을 기준으로 후보 프레임 중 일부 프레임을 대표 이미지로 추출할 수 있다.Therefore, in the present invention, a candidate frame is selected from a moving image, a candidate value including characteristic information of the selected image is calculated for each selected candidate frame, and a certain frame among the candidate frames is extracted as a representative image based on the candidate value have.

이처럼 본 발명의 실시예들에 따르면, 동영상의 콘텍스트를 기반으로 실질적인 주요 장면을 추출할 수 있으며, 동영상의 콘텍스트 기반의 주요 장면을 장면 탐색 기능으로 제공할 수 있어 동영상의 전체적인 내용과 흐름을 쉽게 파악할 수 있다.
As described above, according to the embodiments of the present invention, it is possible to extract a substantial main scene based on the context of a moving picture, and to provide a main scene based on the context of the moving picture as a scene search function, .

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the apparatus and components described in the embodiments may be implemented as a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable logic unit, a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be embodied in any type of machine, component, physical device, computer storage media, or device for interpretation by a processing device or to provide instructions or data to the processing device have. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

In a computer implemented method,
Selecting a candidate frame in a moving picture;
Calculating a candidate value indicating feature information of the image for each of the candidate frames; And
Extracting, as a representative image of the moving image, some of the candidate frames based on the candidate value
Lt; / RTI >
Wherein the calculating step comprises:
Setting a salient area according to a distribution of a saliency map in an image of the candidate frame; And
Calculating the candidate value using the feature value for the main region
&Lt; / RTI >

The method according to claim 1,
Wherein the calculating step comprises:
And calculating the candidate value using a value indicating a time interval between adjacent frames for each of the candidate frames
&Lt; / RTI >

3. The method of claim 2,
The value indicating the time interval is given a larger value as the time interval between frames is longer,
Wherein the extracting comprises:
Extracting a predetermined number of candidate frames or candidate frames whose candidate values are equal to or more than a predetermined value based on the candidate values as the representative images
&Lt; / RTI >

3. The method of claim 2,
Wherein the calculating step comprises:
And calculating the candidate value by further using at least one of a value indicating image quality and a value indicating characteristic information of a caption included in an image for each of the candidate frames
&Lt; / RTI >

The method according to claim 1,
Wherein the calculating step comprises:
Calculating a candidate value for each of the candidate frames using a feature value of the caption region after detecting a caption region in which a character string exists;
&Lt; / RTI >

6. The method of claim 5,
To detect the caption area,
Detecting the caption area using ML-LBP (Multi-Block Local Binary Pattern) for the candidate frame
&Lt; / RTI >

6. The method of claim 5,
To detect the caption area,
Detecting the LBP feature point weight based on the number of LBPs having different values and the number of pixels whose pixel values are equal to or larger than a threshold value using ML-LBP (Multi Block Local Binary Pattern) for the candidate frame
&Lt; / RTI >

The method according to claim 1,
Wherein the selecting comprises:
Extracting a key frame or a frame at a predetermined time interval from the moving image;
Calculating a scene change value between frames for each of the extracted frames; And
Selecting at least some of the extracted frames as the candidate frame based on the scene change value
&Lt; / RTI >

The method according to claim 1,
Providing a scene search function of the moving picture using the representative image
&Lt; / RTI >

10. The method of claim 9,
The step of providing the scene search function may include:
And configuring the representative image as a thumbnail for moving the moving image
&Lt; / RTI >

A computer program stored in a computer readable medium for executing a method for providing a main scene,
The main scene providing method includes:
Selecting a candidate frame in a moving picture;
Calculating a candidate value indicating feature information of the image for each of the candidate frames;
Extracting some of the candidate frames as a representative image of the moving image based on the candidate value; And
Providing a scene search function of the moving picture using the representative image
Lt; / RTI >
Wherein the calculating step comprises:
Setting a salient area according to a distribution of a saliency map in an image of the candidate frame; And
Calculating the candidate value using the feature value for the main region
A computer program stored on a computer readable medium.

In a computer implemented system,
A candidate selecting unit for selecting a candidate frame in the moving image; And
A representative extracting unit for calculating a candidate value indicating feature information of the image for each of the candidate frames, and extracting a frame of the candidate frames as a representative image of the moving image on the basis of the candidate value;
Lt; / RTI >
The representative extracting unit extracts,
Setting a salient area according to a distribution of a saliency map in an image of the candidate frame,
Calculating the candidate value using a feature value of the main region
Lt; / RTI >

13. The method of claim 12,
The representative extracting unit extracts,
And calculating the candidate value using a value indicating a time interval between adjacent frames for each of the candidate frames
Lt; / RTI >

14. The method of claim 13,
The value indicating the time interval is given a larger value as the time interval between frames is longer,
The representative extracting unit extracts,
Extracting a predetermined number of candidate frames or candidate frames whose candidate values are equal to or more than a predetermined value based on the candidate values as the representative images
Lt; / RTI >

14. The method of claim 13,
The representative extracting unit extracts,
And calculating the candidate value by further using at least one of a value indicating image quality and a value indicating feature information of a caption included in the image for each of the candidate frames
Lt; / RTI >

13. The method of claim 12,
The representative extracting unit extracts,
Calculating a candidate value for each of the candidate frames using a feature value of the caption region after detecting a caption region in which a character string exists;
Lt; / RTI >

17. The method of claim 16,
The representative extracting unit extracts,
Detecting the LBP feature point weight based on the number of LBPs having different values and the number of pixels whose pixel values are equal to or larger than a threshold value using ML-LBP (Multi Block Local Binary Pattern) for the candidate frame
Lt; / RTI >

13. The method of claim 12,
Wherein the candidate selector comprises:
Extracting a key frame or a frame at a predetermined time interval from the moving image, calculating a scene change value between frames with respect to each of the extracted frames, and calculating at least some frames of the extracted frames based on the scene change value Selecting the candidate frame
Lt; / RTI >

13. The method of claim 12,
A scene search unit for providing a scene search function of the moving image using the representative image,
&Lt; / RTI >

20. The method of claim 19,
Wherein the scene search unit comprises:
And configuring the representative image as a thumbnail for moving the moving image
Lt; / RTI >