KR102742878B1

KR102742878B1 - Apparatus and method for editing an image data using an artificial intelligence automatically in the image editing apparatus

Info

Publication number: KR102742878B1
Application number: KR1020210182517A
Authority: KR
Inventors: 성인호
Original assignee: 주식회사 에스제이테크놀로지
Priority date: 2021-12-20
Filing date: 2021-12-20
Publication date: 2024-12-16
Anticipated expiration: 2041-12-20
Also published as: KR20230093683A

Abstract

본 발명의 일 실시 예에 따른, 영상 편집 장치에서 인공지능을 이용하여 영상 데이터를 자동으로 편집하는 장치 및 방법은, 다수의 학습용 영상 데이터들을 수집하여 저장하는 영상 수집부; 다수의 원본 영상 데이터들을 입력받아 저장하는 영상 입력부; 영상 유형별로 다수의 장면 학습 영상 데이터들을 저장하는 장면 학습 데이터베이스; 상기 영상 유형별로 다수의 편집 학습 영상 데이터들을 저장하는 편집 학습 데이터베이스; 상기 학습용 영상 데이터들 중에서 사용자에 의해 입력된 특정 영상 유형에 해당하는 복수의 학습용 영상 데이터들을 선택하고, 상기 선택된 학습용 영상 데이터들과 미리 지정된 장면 학습 파라미터와 미리 지정된 편집 학습 파라미터를 이용하여 인공지능 엔진을 학습시키는 인공지능 엔진 학습부; 미리 지정된 특정 소리를 기준으로 상기 원본 영상 데이터들의 기준 시점을 동기화하는 영상 동기화부; 상기 장면 학습 영상 데이터들을 이용하여 상기 동기화된 원본 영상 데이터들에서 다수의 장면들을 분석하는 장면 분석부; 및 상기 인공지능 엔진을 이용하는 인공지능 프로그램에 상기 편집 학습 영상 데이터들을 적용하여 상기 분석된 장면들 중에서 복수의 장면들을 선별하고, 상기 선별된 장면들을 편집하여 편집 영상 데이터를 생성하는 편집 분석부를 포함한다.According to one embodiment of the present invention, a device and method for automatically editing image data using artificial intelligence in an image editing device include an image collection unit for collecting and storing a plurality of learning image data; an image input unit for receiving and storing a plurality of original image data; a scene learning database for storing a plurality of scene learning image data by image type; an editing learning database for storing a plurality of edited learning image data by image type; an artificial intelligence engine learning unit for selecting a plurality of learning image data corresponding to a specific image type input by a user from among the learning image data, and for training an artificial intelligence engine using the selected learning image data and a pre-designated scene learning parameter and a pre-designated editing learning parameter; an image synchronization unit for synchronizing a reference point of the original image data based on a pre-designated specific sound; a scene analysis unit for analyzing a plurality of scenes from the synchronized original image data using the scene learning image data; and an editing analysis unit for applying the edited learning image data to an artificial intelligence program using the artificial intelligence engine to select a plurality of scenes from the analyzed scenes, and editing the selected scenes to generate edited image data.

Description

{Apparatus and method for editing an image data using an artificial intelligence automatically in the image editing apparatus}

본 발명은 영상 편집 장치에 관한 것으로, 특히, 영상 편집 장치에서 인공지능을 이용하여 영상 데이터를 자동으로 편집하는 장치 및 방법에 관한 것이다.The present invention relates to a video editing device, and more particularly, to a device and method for automatically editing video data using artificial intelligence in a video editing device.

인터넷과 네트워크의 발달로 현재 미디어 생산은 대형 방송국을 통한 매스 미디어(Mass Media) 방송뿐만 아니라 1인 미디어 방송에서도 활발하게 진행되고 있다. 이러한 미디어 방송의 확장은 영상 편집의 급속한 증가를 가져왔다. With the development of the Internet and networks, media production is now actively taking place not only through mass media broadcasting through large broadcasting stations, but also through individual media broadcasting. This expansion of media broadcasting has led to a rapid increase in video editing.

특히, 방송이나 SNS(social network service)와 같은 매체를 통해 영상을 전파하기 위해서는 해당 영상의 목적에 따라 영상을 편집해야 한다. 여기서, 영상의 목적은 광고 또는 해당 방송의 하이라이트(highlight) 요약 등을 포함한다. In particular, in order to spread a video through media such as broadcasting or SNS (social network service), the video must be edited according to the purpose of the video. Here, the purpose of the video includes advertising or a summary of the highlights of the broadcast.

그러나 이러한 영상 편집은 사용자(예를 들면, 편집자)에 의해 일일이 수동으로 이루어지고 있기 때문에 많은 시간과 노력이 소모되는 문제점이 있었다. 예를 들면, 영상 편집이 교차 편집인 경우, 숙련된 편집자일지라도 5시간 내지 10시간 이상의 시간을 소요하여 교차 편집으로 5분 이내의 영상을 제작할 수 있다. 여기서, 교차 편집 영상은 편집자가 임의로 아이돌 그룹의 여러 무대 영상을 편집하여 노래와 안무의 흐름이 이어지는 하나의 영상으로 만든 2차 저작물로서, 유튜브 등의 영상 제공 플랫폼에서 큰 인기를 누리고 있다.However, since this video editing is done manually by users (e.g., editors), there was a problem that it took a lot of time and effort. For example, if the video editing is cross-editing, even an experienced editor can take 5 to 10 hours or more to produce a video of less than 5 minutes through cross-editing. Here, a cross-edited video is a secondary work in which an editor arbitrarily edits several stage videos of an idol group to create a single video in which the flow of the song and choreography is continuous, and it is very popular on video providing platforms such as YouTube.

그리고 영상 편집에 의한 결과물의 제작 품질은 편집자의 숙련도에 따라 큰 품질 차이가 발생하는 문제점이 있었다.And there was a problem that the quality of the results produced through video editing had a large difference depending on the skill of the editor.

따라서, 이러한 문제점을 해결하기 위한 방안의 필요성이 대두하였다.Therefore, the need for a solution to solve these problems has arisen.

본 발명의 일 실시 예는 영상 편집 장치에서 짧은 시간과 적은 노력을 소모하여 영상 데이터를 자동으로 편집하는 장치 및 방법을 제안한다.One embodiment of the present invention proposes a device and method for automatically editing image data in a short time and with little effort in an image editing device.

그리고 본 발명의 일 실시 예는 영상 편집 장치에서 결과물이 일정한 품질을 가지도록 영상 데이터를 자동으로 편집하는 장치 및 방법을 제안한다. And one embodiment of the present invention proposes a device and method for automatically editing image data so that the result has a constant quality in an image editing device.

본 발명의 일 실시 예에 따른, 영상 편집 장치에서 인공지능을 이용하여 영상 데이터를 자동으로 편집하는 장치는, 다수의 학습용 영상 데이터들을 수집하여 저장하는 영상 수집부; 다수의 원본 영상 데이터들을 입력받아 저장하는 영상 입력부; 영상 유형별로 다수의 장면 학습 영상 데이터들을 저장하는 장면 학습 데이터베이스; 상기 영상 유형별로 다수의 편집 학습 영상 데이터들을 저장하는 편집 학습 데이터베이스; 상기 학습용 영상 데이터들 중에서 사용자에 의해 입력된 특정 영상 유형에 해당하는 복수의 학습용 영상 데이터들을 선택하고, 상기 선택된 학습용 영상 데이터들과 미리 지정된 장면 학습 파라미터와 미리 지정된 편집 학습 파라미터를 이용하여 인공지능 엔진을 학습시키는 인공지능 엔진 학습부; 미리 지정된 특정 소리를 기준으로 상기 원본 영상 데이터들의 기준 시점을 동기화하는 영상 동기화부; 상기 장면 학습 영상 데이터들을 이용하여 상기 동기화된 원본 영상 데이터들에서 다수의 장면들을 분석하는 장면 분석부; 및 상기 인공지능 엔진을 이용하는 인공지능 프로그램에 상기 편집 학습 영상 데이터들을 적용하여 상기 분석된 장면들 중에서 복수의 장면들을 선별하고, 상기 선별된 장면들을 편집하여 편집 영상 데이터를 생성하는 편집 분석부를 포함한다.According to one embodiment of the present invention, a device for automatically editing image data using artificial intelligence in an image editing device includes an image collection unit for collecting and storing a plurality of learning image data; an image input unit for receiving and storing a plurality of original image data; a scene learning database for storing a plurality of scene learning image data by image type; an editing learning database for storing a plurality of edited learning image data by image type; an artificial intelligence engine learning unit for selecting a plurality of learning image data corresponding to a specific image type input by a user from among the learning image data, and for training an artificial intelligence engine using the selected learning image data and a pre-designated scene learning parameter and a pre-designated editing learning parameter; an image synchronization unit for synchronizing a reference point of the original image data based on a pre-designated specific sound; a scene analysis unit for analyzing a plurality of scenes from the synchronized original image data using the scene learning image data; and an editing analysis unit for applying the edited learning image data to an artificial intelligence program using the artificial intelligence engine to select a plurality of scenes from the analyzed scenes, and editing the selected scenes to generate edited image data.

본 발명의 일 실시 예에 따른, 영상 편집 장치에서 인공지능을 이용하여 영상 데이터를 자동으로 편집하는 방법은, 영상 수집부가, 다수의 학습용 영상 데이터들을 수집하여 저장하는 과정, 인공지능 엔진 학습부가, 상기 학습용 영상 데이터들 중에서 사용자에 의해 입력된 특정 영상 유형에 해당하는 복수의 학습용 영상 데이터들을 선택하는 과정, 상기 인공지능 엔진 학습부가, 상기 선택된 학습용 영상 데이터들과 미리 지정된 장면 학습 파라미터와 미리 지정된 편집 학습 파라미터를 이용하여 인공지능 엔진을 학습시키는 과정, 영상 입력부가, 다수의 원본 영상 데이터들을 입력받아 저장하는 과정, 영상 동기화부가, 미리 지정된 특정 소리를 기준으로 상기 원본 영상 데이터들의 기준 시점을 동기화하는 과정, 장면 분석부가, 상기 장면 학습 영상 데이터들을 이용하여 상기 동기화된 원본 영상 데이터들에서 다수의 장면들을 분석하는 과정, 편집 분석부가, 상기 인공지능 엔진을 이용하는 인공지능 프로그램에 상기 편집 학습 영상 데이터들을 적용하여 상기 분석된 장면들 중에서 복수의 장면들을 선별하는 과정, 상기 편집 분석부가, 상기 선별된 장면들을 편집하여 편집 영상 데이터를 생성하는 과정을 포함한다.According to one embodiment of the present invention, a method for automatically editing image data using artificial intelligence in an image editing device includes: an image collection unit collecting and storing a plurality of training image data; an artificial intelligence engine learning unit selecting a plurality of training image data corresponding to a specific image type input by a user from among the training image data; an artificial intelligence engine learning unit training an artificial intelligence engine using the selected training image data and a pre-designated scene learning parameter and a pre-designated editing learning parameter; an image input unit receiving and storing a plurality of original image data; an image synchronization unit synchronizing a reference point of the original image data based on a pre-designated specific sound; a scene analysis unit analyzing a plurality of scenes from the synchronized original image data using the scene learning image data; an editing analysis unit applying the editing learning image data to an artificial intelligence program using the artificial intelligence engine to select a plurality of scenes from among the analyzed scenes; and an editing analysis unit editing the selected scenes to generate edited image data.

본 발명의 일 실시 예는 영상 편집 장치에서 인공지능을 이용하여 영상 데이터를 자동으로 편집하여 짧은 시간과 적은 노력으로 영상 데이터를 편집할 수 있다.One embodiment of the present invention automatically edits image data using artificial intelligence in an image editing device, thereby enabling the editing of image data in a short time and with little effort.

그리고 본 발명의 일 실시 예는 영상 편집 장치에서 인공지능을 이용하여 영상 데이터를 자동으로 편집하여 결과물이 일정한 품질을 가지도록 영상 데이터를 편집할 수 있다.And one embodiment of the present invention can automatically edit image data using artificial intelligence in an image editing device so that the result has a consistent quality.

그 외에 본 발명의 실시 예로 인해 얻을 수 있거나 예측되는 효과에 대해서는 본 발명의 실시 예에 대한 상세한 설명에서 직접적 또는 암시적으로 개시하도록 한다. 즉, 본 발명의 실시 예에 따라 예측되는 다양한 효과에 대해서는 후술될 상세한 설명 내에서 개시될 것이다. In addition, the effects that can be obtained or expected by the embodiments of the present invention will be directly or implicitly disclosed in the detailed description of the embodiments of the present invention. That is, the various effects expected according to the embodiments of the present invention will be disclosed in the detailed description to be described later.

도 1은 본 발명의 일 실시 예에 따른 영상 편집 장치의 블록 구성도이다.
도 2는 본 발명의 일 실시 예에 따른 영상 동기화부에서 다수의 영상 데이터들의 기준 시점을 동기화하는 동작을 도시한 도면이다.
도 3은 본 발명의 일 실시 예에 따른 편집 분석부에서 다수의 영상 데이터들을 편집하는 동작을 도시한 도면이다.
도 4는 본 발명의 일 실시 예에 따른 영상 편집 장치에서 영상 데이터를 자동으로 편집하는 흐름도이다.
도 5는 본 발명의 일 실시 예에 따른 인공지능 엔진 학습부에서 인공지능 엔진을 학습하는 흐름도이다.FIG. 1 is a block diagram of an image editing device according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating an operation of synchronizing reference points of a plurality of image data in an image synchronization unit according to one embodiment of the present invention.
FIG. 3 is a diagram illustrating an operation of editing a plurality of image data in an editing analysis unit according to an embodiment of the present invention.
FIG. 4 is a flowchart of automatically editing image data in an image editing device according to an embodiment of the present invention.
FIG. 5 is a flowchart of learning an artificial intelligence engine in an artificial intelligence engine learning unit according to one embodiment of the present invention.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 본 발명에 대해 구체적으로 설명하기로 한다.The terms used in this specification will be briefly explained, and the present invention will be described in detail.

본 발명의 실시 예에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당하는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in the embodiments of the present invention are selected from the most widely used general terms possible while considering the functions of the present invention, but this may vary depending on the intention of engineers working in the field, precedents, the emergence of new technologies, etc. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meanings thereof will be described in detail in the description of the corresponding invention. Therefore, the terms used in the present invention should be defined based on the meanings of the terms and the overall contents of the present invention, rather than simply the names of the terms.

본 발명의 실시 예들은 다양한 변환을 가할 수 있고 여러 가지 실시 예를 가질 수 있는바, 특정 실시 예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나 이는 특정한 실시 형태에 대해 범위를 한정하려는 것이 아니며, 발명된 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 실시 예들을 설명함에서 관련된 공지 기술에 대한 구체적인 설명이 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.The embodiments of the present invention can be variously transformed and have various embodiments, and specific embodiments are illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the scope to specific embodiments, and it should be understood that it includes all transformations, equivalents, or substitutes included in the invention's idea and technical scope. In describing the embodiments, if it is judged that a specific description of a related known technology may obscure the main point, the detailed description is omitted.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 구성요소들은 용어들에 의해 한정되어서는 안 된다. 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only to distinguish one component from another.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "구성되다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, the terms "comprise" or "consist of" and the like are intended to specify the presence of a feature, number, step, operation, component, part or combination thereof described in the specification, but should be understood to not preclude the possibility of the presence or addition of one or more other features, numbers, steps, operations, components, parts or combinations thereof.

본 발명의 실시 예에서 '모듈' 혹은 '부'는 적어도 하나의 기능이나 동작을 수행하며, 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다. 또한, 복수의 '모듈' 혹은 복수의 '부'는 특정한 하드웨어로 구현될 필요가 있는 '모듈' 혹은 '부'를 제외하고는 적어도 하나의 모듈로 일체화되어 적어도 하나의 프로세서(미도시)로 구현될 수 있다.In an embodiment of the present invention, a 'module' or 'part' performs at least one function or operation, and may be implemented by hardware or software, or by a combination of hardware and software. In addition, a plurality of 'modules' or a plurality of 'parts' may be integrated into at least one module and implemented by at least one processor (not shown), except for a 'module' or 'part' that needs to be implemented by specific hardware.

본 발명의 실시 예에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.In the embodiments of the present invention, when a part is said to be "connected" to another part, this includes not only the case where it is "directly connected" but also the case where it is "electrically connected" with another element in between. In addition, when a part is said to "include" a certain component, this does not mean that the other component is excluded, but that the other component can be further included, unless specifically stated otherwise.

아래에서는 첨부한 도면을 참고하여 본 발명의 실시 예에 대하여 본 발명가 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the attached drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily practice them. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In addition, in order to clearly describe the present invention in the drawings, parts that are not related to the description are omitted, and similar parts are assigned similar drawing reference numerals throughout the specification.

도 1은 본 발명의 일 실시 예에 따른 영상 편집 장치의 블록 구성도이다.FIG. 1 is a block diagram of an image editing device according to an embodiment of the present invention.

도 1을 참조하면, 영상 편집 장치(101)는 영상 수집부(103)와 인공지능(Artificial Intelligence, 이하 'AI'라 한다) 엔진 학습부(105)와 장면 학습 데이터베이스(DataBase, 이하 'DB'라 한다)(107)와 편집 학습 데이터베이스(109)와 영상 입력부(111)와 영상 동기화부(113)와 장면 분석부(115)와 편집 분석부(117)를 포함한다. Referring to FIG. 1, the image editing device (101) includes an image collection unit (103), an artificial intelligence (AI) engine learning unit (105), a scene learning database (DataBase, hereinafter referred to as 'DB') (107), an editing learning database (109), an image input unit (111), an image synchronization unit (113), a scene analysis unit (115), and an editing analysis unit (117).

각 구성요소를 살펴보면, 영상 수집부(103)는 외부에서 제작된 다수의 학습용 영상 데이터들(125)을 수집하여 저장한다. 예를 들면, 학습용 영상 데이터들은 이미 편집된 영상 데이터들이며, 학습용 영상 데이터들의 유형에 따라 분류되어 저장될 수 있다. 예를 들면, 학습용 영상 데이터들의 유형은 영상 데이터의 장르에 따라 구분될 수 있다. 예를 들면, 영상 데이터의 장르는 예능 영상 데이터, 공연 영상 데이터(예를 들면, 무대 영상 데이터 등), 리뷰 영상 데이터(예를 들면, 자동차, 부동산 또는 장난감 리뷰 영상 데이터 등), 스포츠 영상 데이터(예를 들면, 현장 또는 중계 스포츠 영상 데이터 등), 하이라이트 영상 데이터(예를 들면, 영상, 드라마, 스포츠(예를 들면, 득점 장면 하이라이트 등),예능 또는 다큐멘터리 하이라이트 영상 데이터 등)를 포함할 수 있다. 예를 들면, 선택된 학습용 영상 데이터들은 서로 동일한 길이를 가진 학습용 영상 데이터일 수 있다.Looking at each component, the video collection unit (103) collects and stores a plurality of externally produced learning video data (125). For example, the learning video data are already edited video data, and can be classified and stored according to the type of the learning video data. For example, the type of the learning video data can be classified according to the genre of the video data. For example, the genre of the video data can include entertainment video data, performance video data (for example, stage video data, etc.), review video data (for example, car, real estate, or toy review video data, etc.), sports video data (for example, on-site or broadcast sports video data, etc.), highlight video data (for example, video, drama, sports (for example, scoring scene highlight, etc.), entertainment or documentary highlight video data, etc.). For example, the selected learning video data can be learning video data having the same length.

인공지능 엔진 학습부(103)는 다수의 학습용 영상 데이터들(125) 중에서 사용자에 의해 선택된 분류된 영상 데이터들을 이용하여 인공지능 엔진을 학습한다. 예를 들면, 인공지능 엔진은 인공지능 프로그램에서 두뇌 역할을 수행하는 소프트웨어일 수 있다. 예를 들면, 인공지능 프로그램은 다수의 영상 데이터들을 편집하여 하나의 영상 데이터를 생성하는 기능을 제공하는 소프트웨어일 수 있다. 예를 들면, 인공지능 프로그램은 ResNext-101 모델일 수 있다.The artificial intelligence engine learning unit (103) learns the artificial intelligence engine by using classified image data selected by the user from among a plurality of learning image data (125). For example, the artificial intelligence engine may be software that acts as a brain in an artificial intelligence program. For example, the artificial intelligence program may be software that provides a function of editing a plurality of image data to create a single image data. For example, the artificial intelligence program may be a ResNext-101 model.

좀 더 자세히 설명하면, 인공지능 엔진 학습부(103)는 사용자로부터 인공지능 엔진의 학습을 위한 영상 유형을 입력받는다. 그리고 인공지능 엔진 학습부(103)는 다수의 학습용 영상 데이터들 중에서 입력된 영상 유형에 대응하는 복수의 학습용 영상 데이터들을 선택한다. To explain in more detail, the artificial intelligence engine learning unit (103) receives an image type for learning of the artificial intelligence engine from the user. Then, the artificial intelligence engine learning unit (103) selects a plurality of learning image data corresponding to the input image type from among a plurality of learning image data.

그리고 인공지능 엔진 학습부(103)는 선택된 학습용 영상 데이터들에 미리 지정된 장면 학습 파라미터를 적용하여 인공지능 엔진을 학습시킨다. 예를 들면, 장면 학습 파라미터는 선택된 학습용 영상 데이터들에서 좋은 장면이 가지는 특성을 추출하기 위한 것일 수 있다. And the artificial intelligence engine learning unit (103) trains the artificial intelligence engine by applying pre-designated scene learning parameters to the selected learning image data. For example, the scene learning parameters may be for extracting characteristics of good scenes from the selected learning image data.

예를 들면, 장면 학습 파라미터는 화자 수, 객체 움직임, 화면 구도, 카메라 움직임 및 오디오 음절 중 적어도 하나를 포함할 수 있다. For example, scene learning parameters may include at least one of number of speakers, object movement, scene composition, camera movement, and audio syllables.

예를 들면, 화자 수는 화자가 1인 인 경우, 화자 1인이 나오는 장면을 선택하는 것과 화자가 2인 이상인 경우, 복수의 화자들이 포함된 장면을 선택하는 것을 포함할 수 있다. 예를 들면, 객체 움직임은 객체가 화면에서 움직이는 속도가 변하는 구간을 단위 장면으로 구분하는 규칙, 객체가 화면 중심에서 일정 거리 이상 멀어지는 장면을 학습하는 것과 복수의 움직이는 객체들 중에서 움직임이 다른 객체가 존재하는 장면을 학습하는 것을 포함할 수 있다. For example, the number of speakers may include selecting a scene with one speaker when there is one speaker, and selecting a scene with multiple speakers when there are two or more speakers. For example, object movement may include learning a rule that divides a section where the speed at which an object moves on the screen changes into a unit scene, learning a scene where an object moves a certain distance away from the center of the screen, and learning a scene where an object with different movement exists among multiple moving objects.

예를 들면, 화면 구도는 중심 객체에 따른 화면 구도와 카메라 앵글에 따른 화면 구도를 포함할 수 있다. 예를 들면, 중심 객체에 따른 화면 구도는 인물 중심의 구도와 비인물 중심의 구도로 구분될 수 있다. 예를 들면, 인물 중심의 구도인 경우, 화면 구도는 바스트 샷, 풀 샷 또는 클로즈업(예를 들면, 얼굴 중심)을 포함할 수 잇다. 예를 들면, 비인물 중심의 구도인 경우, 화면 구도는 객체(예를 들면, 비인물)의 포함 여부에 따라 달라지는데, 객체를 포함하는 경우, 화면 중심에 객체가 있는 풀 샷을 우선하는 것을 포함할 수 있다. 예를 들면, 카메라 앵글에 따른 화면 구도는 특이한 앵글(예를 들면, 부감 도는 조감) 여부에 따라 장면을 선택하는 것일 수 있다. For example, the screen composition may include a screen composition according to a central object and a screen composition according to a camera angle. For example, the screen composition according to a central object may be divided into a person-centered composition and a non-person-centered composition. For example, in the case of a person-centered composition, the screen composition may include a bust shot, a full shot, or a close-up (e.g., centered on a face). For example, in the case of a non-person-centered composition, the screen composition may vary depending on whether an object (e.g., a non-person) is included, and if an object is included, it may include giving priority to a full shot with the object at the center of the screen. For example, the screen composition according to a camera angle may select a scene depending on whether it is from a special angle (e.g., a bird's eye view or a bird's eye view).

예를 들면, 카메라 움직임은 패닝(Panning) 여부와 틸팅(Tilting) 여부와 줌(Zoom) 여부를 포함할 수 있다. 예를 들면, 패닝 여부는 카메라 각도가 좌 또는 우로 5도 이상 변화할 때, 미리 지정된 카메라 패닝 적정 시간을 고려하여 패인 시작 시점과 끝 시점에 대한 판단 기준을 결정하는 것일 수 있다. 예를 들면, 틸팅 여부는 카메라가 각도가 상 또는 하로 2도 이상 변화할 때, 미리 지정된 카메라 틸팅 적정 시간을 고려하여 틸팅 시작 시점과 끝 시점에 대한 판단 기준을 결정하는 것일 수 있다. 예를 들면, 줌 여부는 줌 대상 객체의 크기가 커지는 변화 정도를 학습하거나 줌 대상 객체의 크기가 작아지는 변화 정도를 학습하는 것일 수 있다.For example, the camera movement may include whether to pan, whether to tilt, and whether to zoom. For example, whether to pan may be to determine the judgment criteria for the start and end points of the pan by considering a pre-specified appropriate camera panning time when the camera angle changes more than 5 degrees to the left or right. For example, whether to tilt may be to determine the judgment criteria for the start and end points of the tilt by considering a pre-specified appropriate camera tilting time when the camera angle changes more than 2 degrees to the up or down. For example, whether to zoom may be to learn the degree of change in the size of the zoom target object increasing or learning the degree of change in the size of the zoom target object decreasing.

예를 오디오 음절은 오디오 음 크기 변화, 음절이 묶음이 되는 경우, 화자가 특정 문장을 말하는 장면을 포함할 수 있다. 예를 들면, 오디오 음 크기 변화는 오디오 음 크기가 변화하거나 묵음 시간이 미리 지정된 시간 이상인 경우를 포함할 수 있다. 예를 들면, 특정 문장은 "한번 보실까요?"와 "살펴보겠습니다." 중 적어도 하나를 포함할 수 있다.For example, audio syllables may include audio loudness changes, syllables being grouped, or a speaker speaking a particular sentence. For example, audio loudness changes may include audio loudness changes or silences exceeding a predetermined amount of time. For example, a particular sentence may include at least one of "Shall we take a look?" and "Let's take a look."

그리고 인공지능 엔진 학습부(103)는 선택된 학습용 영상 데이터들 중에서 장면 학습 파라미터를 충족하는, 적어도 하나의 장면에 해당하는 적어도 하나의 영상 데이터(이하, '장면 학습 영상 데이터'라 한다)를 생성한다. 그리고 인공지능 엔진 학습부(103)는 생성된 장면 학습 영상 데이터를 장면 학습 데이터베이스(107)에 저장한다.And the artificial intelligence engine learning unit (103) generates at least one image data (hereinafter referred to as 'scene learning image data') corresponding to at least one scene that satisfies the scene learning parameters among the selected learning image data. And the artificial intelligence engine learning unit (103) stores the generated scene learning image data in the scene learning database (107).

그리고 인공지능 엔진 학습부(103)는 선택된 학습용 영상 데이터들에 미리 지정된 편집 학습 파라미터를 적용하여 인공지능 엔진을 학습시킨다. 예를 들면, 편집 학습 파라미터는 다수의 장면들 중에서 편집 학습 파라미터에 충족되는 복수의 장면들을 선택하고, 선택된 장면들을 편집하기 위한 것일 수 있다.And the artificial intelligence engine learning unit (103) trains the artificial intelligence engine by applying pre-specified editing learning parameters to selected learning image data. For example, the editing learning parameters may be for selecting multiple scenes that satisfy the editing learning parameters among a plurality of scenes and editing the selected scenes.

예를 들면, 편집 학습 파라미터는 장면 전환, 오디오 전환, 대화 인식, 액션 인식, 배경음악 위치 중 적어도 하나를 포함할 수 있다. For example, the editing learning parameters may include at least one of scene transition, audio transition, dialogue recognition, action recognition, and background music position.

예를 들면, 장면 전환은 이전 장면과 다음 장면의 화면 복잡도, 화면 내 움직임 정도, 화면 내 조명의 변화, 화면 배색의 변화, 화면 내 시간대의 변화(예를 들면, 밤 또는 낮 정도), 객체(인물 또는 대상)의 움직임과 화면 내 크기 변화 및 화면 전환 기법(예를 들면, 페이드, 와이핑, 디졸브)을 포함할 수 있다. 예를 들면, 대화 인식은 인물과 오디오 간의 매칭을 사전에 학습하는 것과 화면의 대상 중에서 화자를 인식하는 것을 포함할 수 있다. For example, scene transitions may include screen complexity between previous and next scenes, degree of motion within the screen, changes in lighting within the screen, changes in screen color scheme, changes in time of day within the screen (e.g., night or day), movement and size of objects (people or objects) within the screen, and screen transition techniques (e.g., fades, wipes, dissolves). For example, speech recognition may include pre-learning matching between people and audio, and recognizing speakers among objects on the screen.

예를 들면, 액션 인식은 화면 내에서 튀는 동작을 감지하는 것과 표정을 인식하는 것을 포함할 수 있다. 화면 내에서 튀는 동작을 감지하는 것은 연속되는 장면에서 구분이 가능한 수준의 동작 변화를 구분하는 것일 수 있다. 표정을 인식하는 것은 웃음, 울음, 슬픔, 기쁨 등 대상 중 화자의 표정을 인식하여 구별하는 것일 수 있다.For example, action recognition can include detecting bouncing motions within a screen and recognizing facial expressions. Detecting bouncing motions within a screen can be distinguishing between motion changes that can be distinguished in a sequence of scenes. Recognizing facial expressions can be recognizing and distinguishing between the speaker's facial expressions, such as laughter, crying, sadness, and joy.

예를 들면, 배경음악 위치는 배경음악이 삽입된 구간을 구별하는 것과 배경음악이 삽입된 구간에서 객체의 움직임 변화를 구별하는 것과 배경음악이 삽입된 구간에서 화면의 역동성과 흐름의 차이 중 적어도 하나를 구별하는 것을 포함할 수 있다.For example, the background music location may include distinguishing at least one of: distinguishing a section where background music is inserted, distinguishing a change in the movement of an object in the section where background music is inserted, and distinguishing a difference in the dynamics and flow of the screen in the section where background music is inserted.

그리고 인공지능 엔진 학습부(103)는 선택된 학습용 영상 데이터들 중에서 편집 학습 파라미터를 충족하는, 적어도 하나의 편집점에 해당하는 적어도 하나의 영상 데이터(이하, '편집 학습 영상 데이터'라 한다)를 생성한다. 그리고 인공지능 엔진 학습부(103)는 생성된 편집 학습 영상 데이터를 편집 학습 데이터베이스(109)에 저장한다.And the artificial intelligence engine learning unit (103) generates at least one image data (hereinafter referred to as “editing learning image data”) corresponding to at least one editing point that satisfies the editing learning parameter among the selected learning image data. And the artificial intelligence engine learning unit (103) stores the generated editing learning image data in the editing learning database (109).

장면 학습 데이터베이스(107)는 적어도 하나의 장면 학습 영상 데이터를 저장한다. 예를 들면, 장면 학습 데이터베이스(109)는 영상 데이터의 유형별로 다수의 장면 학습 영상 데이터들을 저장할 수 있다.The scene learning database (107) stores at least one scene learning image data. For example, the scene learning database (109) can store a plurality of scene learning image data by type of image data.

편집 학습 데이터베이스(109)는 적어도 하나의 편집 학습 영상 데이터를 저장한다. 예를 들면, 편집 학습 데이터베이스(109)는 영상 데이터의 유형별로 다수의 편집 학습 영상 데이터들을 저장할 수 있다. The editing learning database (109) stores at least one editing learning image data. For example, the editing learning database (109) can store a plurality of editing learning image data by type of image data.

영상 입력부(111)는 사용자로부터 다수의 원본 영상 데이터들(123)을 입력받아 저장한다. 예를 들면, 원본 영상 데이터들은 다수의 영상 촬영 카메라들에 의해 촬영된 다수의 영상 데이터들일 수 있다. 이때, 영상 입력부(111)는 사용자로부터 다수의 원본 영상 데이터들의 유형을 입력받을 수 있다.The image input unit (111) receives and stores a plurality of original image data (123) from the user. For example, the original image data may be a plurality of image data captured by a plurality of image capturing cameras. At this time, the image input unit (111) may receive a plurality of types of original image data from the user.

영상 동기화부(113)는 다수의 원본 영상 데이터들(123)의 기준 시점을 동기화한다. 예를 들면, 영상 동기화부(113)는, 도 2에 도시된 바와 같이, 원본 영상 데이터들(123)에 포함된 미리 지정된 소리를 기준 시점으로 지정하고, 지정된 기준 시점의 특정 프레임(예를 들면, xxxx년 yy월 zz일 aa시 bb분 cc초 dd프레임)들을 시작 세로 시간 부호(Longitudinal Time Code, 이하 'LTC'라 한다)들로 설정하여 음향 동기화를 수행할 수 있다. 예를 들면, 동기화 소리는 박수 소리일 수 있다. 예를 들면, 영상 데이터의 한 장면은 30 프레임으로 구성될 수 있다. 이때, 영상 동기화부(113)는 마스터 레코딩 트랙(Master Recording Track)을 별도로 사용하거나 다수의 영상 촬영 카메라들 중에서 특정 영상 촬영 카메라의 오디오 트랙을 마스터 오디오 트랙으로 사용할 수 있다. 그리고 영상 동기화부(113)는 원본 영상 데이터들(123)의 시작 세로 시간 부호들을 동일 시점으로 원본 영상 데이터들의 프레임들(예를 들면, xxxx년 yy월 zz일 aa시 bb분 cc초 dd프레임)을 지정하여 비디오 동기화를 수행할 수 있다. 그리고 영상 동기화부(113)는 오디오 트랙 시작 전에는 묵음 구간을 제거하고, 오디오 트랙 종료 후에는 프레임 마진을 부여할 수 있다. The image synchronization unit (113) synchronizes the reference time of a plurality of original image data (123). For example, the image synchronization unit (113), as illustrated in FIG. 2, may perform sound synchronization by designating a pre-designated sound included in the original image data (123) as a reference time, and setting specific frames (e.g., xxxxx year yy month zz day aa hour bb minute cc second dd frame) of the designated reference time as starting longitudinal time codes (hereinafter referred to as 'LTC'). For example, the synchronization sound may be a clapping sound. For example, one scene of the image data may consist of 30 frames. At this time, the image synchronization unit (113) may separately use a master recording track or may use an audio track of a specific image capturing camera among a plurality of image capturing cameras as a master audio track. And the video synchronization unit (113) can perform video synchronization by designating the frames of the original video data (123) as the same point in time as the start vertical time codes of the original video data (e.g., dd frame, dd day, yy month, zz day, aa hour, bb minute, cc second, xxxx year). And the video synchronization unit (113) can remove a silent section before the start of an audio track and provide a frame margin after the end of an audio track.

장면 분석부(115)는 미리 학습된 모델(pre-trained model)을 이용하여 다수의 동기화된 원본 영상 데이터들에서 다수의 장면들을 분석한다. 예를 들면, 미리 학습된 모델은 사용자가 풀고자 하는 문제와 비슷하면서 사이즈가 큰 데이터를 이용하여 미리 학습된 모델을 나타낼 수 있다. 예를 들면, 미리 학습된 모델은 장면 학습 영상 데이터들일 수 있다. The scene analysis unit (115) analyzes a plurality of scenes from a plurality of synchronized original image data using a pre-trained model. For example, the pre-trained model can represent a pre-trained model using data that is similar to a problem that the user wants to solve and is large in size. For example, the pre-trained model can be scene learning image data.

좀 더 자세히 설명하면, 장면 분석부(115)는 영상 동기화부(113)로부터 다수의 동기화된 원본 영상 데이터들을 입력받는다. 그리고 장면 분석부(115)는 미리 학습된 모델에 동기화된 원본 영상 데이터들을 통과시켜 동기화된 원본 영상 데이터들에서 다수의 장면들에 대응하는 다수의 특징들을 추출한다. 예를 들면, 미리 학습된 모델은 다수의 원본 영상 데이터들과 동일한 유형을 가진 다수의 장면 학습 영상 데이터들일 수 있다.To explain in more detail, the scene analysis unit (115) receives a plurality of synchronized original image data from the image synchronization unit (113). Then, the scene analysis unit (115) passes the synchronized original image data through a pre-learned model and extracts a plurality of features corresponding to a plurality of scenes from the synchronized original image data. For example, the pre-learned model may be a plurality of scene learning image data having the same type as the plurality of original image data.

그리고 장면 분석부(115)는 다수의 장면들에 대응하는 추출된 특징들 각각에 컨캣(concat) 과정을 수행하여 추출된 특징들을 다수의 데이터 값들로 변환한다. 예를 들면, 컨캣 과정은 추출된 특징에 대한 문자열과 문자열을 결합하여 특정 데이터 값을 생성하는 과정일 수 있다. 그리고 장면 분석부(115)는 다수의 장면들에 대응하는 변환된 데이터 값들을 편집 분석부(117)로 출력한다.And the scene analysis unit (115) performs a concat process on each of the extracted features corresponding to multiple scenes to convert the extracted features into multiple data values. For example, the concat process may be a process of combining strings for the extracted features to generate specific data values. And the scene analysis unit (115) outputs the converted data values corresponding to multiple scenes to the editing analysis unit (117).

편집 분석부(117)는 인공지능 프로그램에 변환된 데이터 값들을 적용하여 다수의 장면들 중에서 복수의 장면들을 선별한다. 그리고 편집 분석부(117)는 다수의 편집 학습 영상 데이터들을 기반으로 선별된 장면들을 자동으로 편집하여 하나의 편집 영상 데이터(119)와 편집 정보(121)를 생성한다. 이후에, 편집 분석부(117)는 편집 영상 데이터(119)와 편집 정보(121)를 데이터베이스(미도시)에 저장하거나 표시부(미도시)를 통해 출력한다. 예를 들면, 편집 정보(121)는 장면에　대한　시작과　끝　시점의　시간　정보,　편집　순서 정보　및　편집　추천　정확도 정보를 포함할 수 있다.The editing analysis unit (117) selects multiple scenes from among a plurality of scenes by applying the data values converted to the artificial intelligence program. Then, the editing analysis unit (117) automatically edits the selected scenes based on a plurality of editing learning image data to generate one editing image data (119) and editing information (121). Thereafter, the editing analysis unit (117) stores the editing image data (119) and the editing information (121) in a database (not shown) or outputs them through a display unit (not shown). For example, the editing information (121) may include time information of the start and end points of the scene, editing order information, and editing recommendation accuracy information.

좀 더 자세히 설명하면, 편집 분석부(117)는 변환된 데이터 값들을 인공지능 엔진 모델에 입력한다. 예를 들면, 인공지능 엔진 모델은 출원인에 의해 개발된 인공지능 알고리즘을 적용하여 구성된 학습 엔진일 수 있다. 그리고 편집 분석부(117)는 유형별　정답 비디오 클래스의　각　파라미터(예를　들면　화자수,　객체의 움직임　크기,　카메라　앵글변화의　크기,　오디오　레벨　변화　등)을　측정하여　그　결과치를 편집 추천 정확도 정보로 생성하여　저장한다. 그리고 편집 분석부(117)는 각　입력 영상 장면에 대한　파라미터　값이　측정된　실측자료와　크로스-엔트로피　손실값을　산출한다. 예를 들면, 실측자료 값은 각 장면이　가지는　파라미터별　특성값을　나타내고,　크로스엔트로피　손실은　편집 추천 정확도를 나타낼 수 있다.To explain in more detail, the editing analysis unit (117) inputs the converted data values into the artificial intelligence engine model. For example, the artificial intelligence engine model may be a learning engine configured by applying an artificial intelligence algorithm developed by the applicant. Then, the editing analysis unit (117) measures each parameter of the correct video class by type (for example, the number of speakers, the size of the object movement, the size of the camera angle change, the change in audio level, etc.) and generates and stores the result as editing recommendation accuracy information. Then, the editing analysis unit (117) calculates the measured actual data and the cross-entropy loss value of the parameter values for each input video scene. For example, the ground truth values represent the characteristic values of each scene for each parameter, and the cross entropy loss can represent the accuracy of editing recommendations.

예를 들면, 학습 내용은 장면의 채택 여부와 장면 구성 길이와 편집점(예를 들면, 장면 전환)을 포함할 수 있다. 예를 들면, 편집이 특정 아이돌 그룹이 출연한 음악 방송의 교차 편집인 경우, 편집점은 특정 아이돌 그룹이 부르는 노래의 소절별로 구분하는 방식을 사용할 수 있다. For example, learning content can include whether or not to adopt a scene, the length of the scene composition, and editing points (e.g., scene transitions). For example, if the editing is a cross-editing of a music show featuring a specific idol group, the editing points can be divided into sections of a song sung by a specific idol group.

예를 들면, 학습 내용은 동작 인식으로 이루어질 수 있다. 예를 들면, 동작 인식은 인물의 말하는 동작, 움직임 인식(예를 들면, 표정, 동작, 움직임, 춤 등), 단위 장면의 시작 및/또는 끝을 인식, 목소리와 인물 간의 매칭(단기 학습을 통해 이루어짐)을 포함할 수 있다. For example, the learning content can consist of gesture recognition. For example, gesture recognition can include speech of a person, recognition of movement (e.g., facial expression, gesture, movement, dance, etc.), recognition of the beginning and/or end of a unit scene, matching between voice and person (achieved through short-term learning).

예를 들면, 편집 분석부(117)는 미리 지정된 편집 규칙을 이용하여 편집을 수행할 수 있다. 예를 들면, 미리 지정된 편집 규칙은 영상 촬영 카메라의 대수에 따라 달라질 수 있다. For example, the editing analysis unit (117) can perform editing using pre-specified editing rules. For example, the pre-specified editing rules can vary depending on the number of video capturing cameras.

예를 들면, 특정 아이돌 그룹이 출연하는 정규 방송 프로그램의 영상을 제작하는 경우, 전체 카메라와 아이돌 그룹의 멤버별로 카메라가 정규 방송에 배정되므로, 영상 촬영 카메라의 대수는 4대 이상일 수 있다. 이러한 경우, 편집 규칙은 전체 카메라에서 촬영된 전체 샷 우선(시작 및/또는 끝에 적용)하기, 한 장면에 두 사람 이상의 목소리가 존재하는 경우, 전체 샷 사용하기, 및 한 장면에 한 사람의 목소리가 존재하는 경우, 해당 인물을 인식하여 해당 인물의 카메라에서 촬영된 전용 샷을 지적하여 사용하기를 포함할 수 있다. 예를 들면, 편집 분석부(117)는 이러한 편집 규칙을 이용하여 동기화된 원본 영상 데이터들을 편집함으로써, 도 3의 301 그래프와 같이, 편집 영상 데이터의 시작 부분과 끝 부분을 전체 샷으로 구성하고, 나머지 중간 부분을 멤버별 샷과 우측 샷으로 구성할 수 있다.For example, when producing a video of a regular broadcast program featuring a specific idol group, since all cameras and cameras for each member of the idol group are assigned to the regular broadcast, the number of video shooting cameras may be four or more. In this case, the editing rules may include giving priority to the entire shot shot by all cameras (applying to the beginning and/or the end), using the entire shot when there are two or more voices in one scene, and recognizing the person's voice in one scene and pointing out and using the dedicated shot shot by the person's camera. For example, the editing analysis unit (117) may edit synchronized original video data using these editing rules, thereby configuring the beginning and the end of the edited video data as the entire shot, and configuring the remaining middle part as the member-specific shot and the right shot.

다른 예로, SNS에서 주로 활동하는 개인 또는 소규모 집단에서 영상을 제작하는 경우, 영상 촬영 카메라의 대수는 3대 이하 일 수 있다. 이러한 경우, 편집 규칙은 침묵 컷(Silence cut)(+/- frame margin) 배제하기, LTC(절대값)를 기준으로 편집 시퀀스(sequence) 정리하기(다수의 원본 영상 데이터에서 화면을 가져다가 사용할 수 있음), 자동 자막 인식하기(타임 구간에 맞춰서), 마스터 오디오 지정하기(예를 들면, 메인 크리에이터 샷(Main Creator Shot)을 촬영하는 카메라의 오디오가 마스터 오디오로 지정될 수 있음), 편집 영상 데이터에 타이틀 클립(Title Clip)과 엔딩 클립(Ending Clip)을 자동으로 삽입하기를 포함할 수 있다. 예를 들면, 편집 분석부(117)는 이러한 편집 규칙을 이용하여 동기화된 원본 영상 데이터들을 편집함으로써, 도 3의 303 그래프와 같이, 편집 영상 데이터의 시작 부분과 끝 부분을 메인 크리에이터 샷으로 구성하고, 나머지 중간 부분을 오브젝트 줌 샷과 우측 샷으로 구성할 수 있다. 이때, 편집 분석부(117)는 비선형 편집 시스템(Non-Linear Editing system, 이하 'NLE'라 한다) 등을 활용하여 추가 편집을 진행할 경우를 대비하여 편집 영상 데이터에 대한 XML(eXtensible Markup Language) 출력 기능을 제공한다.As another example, if a video is produced by an individual or a small group that is mainly active on SNS, the number of video cameras may be three or less. In this case, the editing rules may include excluding silence cuts (+/- frame margin), organizing the editing sequence based on LTC (absolute value) (screens can be taken from multiple original video data and used), automatically recognizing subtitles (according to the time interval), designating master audio (for example, the audio of the camera that shoots the Main Creator Shot can be designated as the master audio), and automatically inserting a title clip and an ending clip into the edited video data. For example, the editing analysis unit (117) may edit synchronized original video data using these editing rules to configure the beginning and end parts of the edited video data as the Main Creator Shot, and the remaining middle part as the Object Zoom Shot and the Right Shot, as in the graph 303 of FIG. 3. At this time, the editing analysis unit (117) provides an XML (eXtensible Markup Language) output function for edited video data in preparation for additional editing using a non-linear editing system (hereinafter referred to as 'NLE').

이러한 구성을 통해, 본 발명의 일 실시 예는 영상 편집 장치에서 인공지능을 이용하여 영상 데이터를 자동으로 편집하여 짧은 시간과 적은 노력으로 영상 데이터를 편집할 수 있다. 그리고 본 발명의 일 실시 예는 영상 편집 장치에서 인공지능을 이용하여 영상 데이터를 자동으로 편집하여 결과물이 일정한 품질을 가지도록 영상 데이터를 편집할 수 있다.Through this configuration, one embodiment of the present invention can automatically edit image data using artificial intelligence in an image editing device, thereby editing image data in a short time and with little effort. And one embodiment of the present invention can automatically edit image data using artificial intelligence in an image editing device, thereby editing image data so that the result has a consistent quality.

도 4는 본 발명의 일 실시 예에 따른 영상 편집 장치(101)에서 영상 데이터를 자동으로 편집하는 흐름도이다.FIG. 4 is a flowchart of automatically editing image data in an image editing device (101) according to one embodiment of the present invention.

도 4를 참조하면, 영상 편집 장치(101)의 인공지능 엔진 학습부(105)는 영상 수집부(103)에서 수집된 다수의 학습용 영상 데이터들을 이용하여 인공지능 엔진을 학습시킨다. 이제부터, 도 5를 참조하여, 인공지능 엔진을 학습시키는 과정을 자세히 설명하고자 한다. Referring to Fig. 4, the artificial intelligence engine learning unit (105) of the image editing device (101) trains the artificial intelligence engine using a plurality of training image data collected from the image collection unit (103). From now on, the process of training the artificial intelligence engine will be described in detail with reference to Fig. 5.

도 5는 본 발명의 일 실시 예에 따른 인공지능 엔진 학습부(105)에서 인공지능 엔진을 학습하는 흐름도이다.Figure 5 is a flow chart of learning an artificial intelligence engine in an artificial intelligence engine learning unit (105) according to one embodiment of the present invention.

도 5를 참조하면, 인공지능 엔진 학습부(105)는, 501 단계에서, 인공지능 엔진의 학습을 위한 영상 유형이 사용자에 의해 입력되는지 여부를 확인한다. 예를 들면, 인공지능 엔진 학습부(105)는 입출력부(미도시)를 통해 사용자로부터 영상 유형을 입력받을 수 있다.Referring to FIG. 5, the artificial intelligence engine learning unit (105) checks, in step 501, whether an image type for learning the artificial intelligence engine is input by the user. For example, the artificial intelligence engine learning unit (105) can receive an image type from the user through an input/output unit (not shown).

확인 결과, 영상 유형이 입력되면, 인공지능 엔진 학습부(105)는 503 단계로 진행하고, 그렇지 않으면, 501 단계를 반복적으로 수행한다.As a result of the verification, if the image type is input, the artificial intelligence engine learning unit (105) proceeds to step 503, otherwise, it repeatedly performs step 501.

503 단계에서, 인공지능 엔진 학습부(105)는 영상 수집부(103)에서 수집된 다수의 학습용 영상 데이터들 중에서 입력된 영상 유형에 대응하는 복수의 학습용 영상 데이터들을 선택한다. In step 503, the artificial intelligence engine learning unit (105) selects a plurality of learning image data corresponding to the input image type from among a plurality of learning image data collected from the image collection unit (103).

505 단계에서, 인공지능 엔진 학습부(105)는 선택된 학습용 영상 데이터들에 미리 지정된 장면 학습 파라미터를 적용하여 인공지능 엔진을 학습시킨다. 이때, 인공지능 엔진 학습부(105)는 선택된 학습용 영상 데이터들 중에서 장면 학습 파라미터를 충족하는, 적어도 하나의 장면에 대응하는 적어도 하나의 영상 데이터를 추출하고, 추출된 영상 데이터를 적어도 하나의 장면 학습 영상 데이터로 생성한다. 그리고 인공지능 엔진 학습부(105)는 생성된 장면 학습 영상 데이터를 장면 학습 데이터베이스(107)에 저장한다.In step 505, the artificial intelligence engine learning unit (105) trains the artificial intelligence engine by applying pre-specified scene learning parameters to the selected learning image data. At this time, the artificial intelligence engine learning unit (105) extracts at least one image data corresponding to at least one scene that satisfies the scene learning parameters from the selected learning image data, and generates the extracted image data as at least one scene learning image data. Then, the artificial intelligence engine learning unit (105) stores the generated scene learning image data in the scene learning database (107).

507 단계에서, 인공지능 엔진 학습부(105)는 선택된 학습용 영상 데이터들에 미리 지정된 편집 학습 파라미터를 적용하여 인공지능 엔진을 학습시킨다. 이때, 인공지능 엔진 학습부(105)는 선택된 학습용 영상 데이터들 중에서 편집 학습 파라미터를 충족하는, 적어도 하나의 편집점에 대응하는 적어도 하나의 영상 데이터를 추출하고, 추출된 영상 데이터를 적어도 하나의 편집 학습 영상 데이터로 생성한다. 그리고 인공지능 엔진 학습부(105)는 생성된 편집 학습 영상 데이터를 편집 학습 데이터베이스(109)에 저장한다.In step 507, the AI engine learning unit (105) trains the AI engine by applying pre-specified editing learning parameters to the selected learning image data. At this time, the AI engine learning unit (105) extracts at least one image data corresponding to at least one editing point that satisfies the editing learning parameters from the selected learning image data, and generates the extracted image data as at least one editing learning image data. Then, the AI engine learning unit (105) stores the generated editing learning image data in the editing learning database (109).

다시 도 4로 돌아와서, 영상 편집 장치(101)의 영상 동기화부(113)는, 403 단계에서, 영상 입력부(111)로부터 입력된 다수의 원본 영상 데이터들의 기준 시점을 동기화한다.Returning to FIG. 4 again, the image synchronization unit (113) of the image editing device (101) synchronizes the reference points of a plurality of original image data input from the image input unit (111) at step 403.

예를 들면, 영상 동기화부(113)는, 도 2에 도시된 바와 같이, 원본 영상 데이터들(123)에 포함된 미리 지정된 소리를 기준 시점으로 지정하고, 지정된 기준 시점의 특정 프레임(예를 들면, xxxx년 yy월 zz일 aa시 bb분 cc초 dd프레임)들을 시작 세로 시간 부호(LTC)들로 설정하여 음향 동기화를 수행할 수 있다. 그리고 영상 동기화부(113)는 원본 영상 데이터들(123)의 시작 세로 시간 부호들을 동일 시점으로 원본 영상 데이터들의 프레임들(예를 들면, xxxx년 yy월 zz일 aa시 bb분 cc초 dd프레임)을 지정하여 비디오 동기화를 수행할 수 있다.For example, the video synchronization unit (113), as illustrated in FIG. 2, may perform sound synchronization by designating a pre-designated sound included in the original video data (123) as a reference point and setting specific frames (e.g., xxxxx year yy month zz day aa hour bb minute cc second dd frame) of the designated reference point as start vertical time codes (LTCs). In addition, the video synchronization unit (113) may perform video synchronization by designating frames of the original video data (e.g., xxxxx year yy month zz day aa hour bb minute cc second dd frame) as the same point in time as the start vertical time codes of the original video data (123).

405 단계에서, 영상 편집 장치(101)의 장면 분석부(115)는 미리 학습된 모델을 이용하여 다수의 동기화된 원본 영상 데이터들에서 다수의 장면들을 분석한다. At step 405, the scene analysis unit (115) of the image editing device (101) analyzes a plurality of scenes from a plurality of synchronized original image data using a pre-learned model.

407 단계에서, 영상 편집 장치(101)의 편집 분석부(117)는 인공지능 프로그램을 이용하여 다수의 장면들 중에서 복수의 장면들을 선별한다. 그리고 편집 분석부(117)는 다수의 편집 학습 영상 데이터들을 기반으로 선별된 장면들을 자동으로 편집하여 하나의 편집 영상 데이터(119)와 편집 정보(121)를 생성한다. In step 407, the editing analysis unit (117) of the video editing device (101) selects multiple scenes from among a plurality of scenes using an artificial intelligence program. Then, the editing analysis unit (117) automatically edits the selected scenes based on a plurality of editing learning video data to generate one editing video data (119) and editing information (121).

이러한 과정을 통해, 본 발명의 일 실시 예는 영상 편집 장치에서 인공지능을 이용하여 영상 데이터를 자동으로 편집하여 짧은 시간과 적은 노력으로 영상 데이터를 편집할 수 있다. 그리고 본 발명의 일 실시 예는 영상 편집 장치에서 인공지능을 이용하여 영상 데이터를 자동으로 편집하여 결과물이 일정한 품질을 가지도록 영상 데이터를 편집할 수 있다.Through this process, one embodiment of the present invention can automatically edit image data using artificial intelligence in an image editing device, thereby editing image data in a short time and with little effort. And one embodiment of the present invention can automatically edit image data using artificial intelligence in an image editing device, thereby editing image data so that the result has a consistent quality.

이상에서는 본 발명의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안 될 것이다.Although the preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the specific embodiments described above, and various modifications may be made by those skilled in the art without departing from the gist of the present invention as claimed in the claims. Furthermore, such modifications should not be individually understood from the technical idea or prospect of the present invention.

101: 영상 편집 장치
103: 영상 수집부
105: 인공지능 엔진 학습부
107; 장면 학습 데이터베이스
109: 편집 학습 데이터베이스
111: 영상 입력부
113: 영상 동기화부
115: 장면 분석부
117; 편집 분석부
119: 편집 영상 데이터
121: 편집 정보
123: 다수의 원본 영상 데이터들
125: 다수의 학습용 영상 데이터들101: Video Editing Devices
103: Video Collection Unit
105: Artificial Intelligence Engine Learning Department
107; Scene Learning Database
109: Edit Learning Database
111: Video input
113: Video synchronization unit
115: Scene Analysis Department
117; Editing Analysis Department
119: Editing video data
121: Editing information
123: Multiple original video data
125: A large number of training image data

Claims

An image collection unit that collects and stores a large number of image data for learning;
An image input unit that receives and stores a plurality of original image data;
A scene learning database that stores a number of scene learning video data by video type;
An editing learning database storing a plurality of editing learning video data by the above video type;
An artificial intelligence engine learning unit that selects a plurality of learning image data corresponding to a specific image type input by a user from among the above learning image data, and trains an artificial intelligence engine using the selected learning image data and pre-specified scene learning parameters and pre-specified editing learning parameters;
A video synchronization unit that synchronizes the reference point of the original video data based on a specific sound designated in advance;
A scene analysis unit that analyzes a plurality of scenes from the synchronized original image data using the scene learning image data; and
An editing analysis unit is included that applies the above-mentioned editing learning video data to an artificial intelligence program using the above-mentioned artificial intelligence engine to select a plurality of scenes from the analyzed scenes, and edits the selected scenes to generate edited video data.

The above artificial intelligence engine learning unit is,
The scene learning parameter is applied to the selected learning image data to train the artificial intelligence engine, and the scene learning image data satisfying the scene learning parameter is extracted from the selected learning image data and stored in the scene learning database.
The artificial intelligence engine is trained by applying the editing learning parameters to the selected learning image data, and the editing learning image data satisfying the editing learning parameters are extracted from the selected learning image data and stored in the editing learning database.
The above scene learning parameters include at least one of the number of speakers, object movement, screen composition, camera movement, and audio syllables.

The above editing learning parameters include at least one of scene transition, audio transition, dialogue recognition, action recognition, and background music position.
A device that automatically edits video data using artificial intelligence in a video editing device, wherein each of the above learning video data is of the same length and pre-edited according to video type.

delete

The video collection unit collects and stores a large number of video data for learning.
The process of the artificial intelligence engine learning unit selecting multiple learning image data corresponding to a specific image type input by the user from among the learning image data;
The above artificial intelligence engine learning unit is a process for learning the artificial intelligence engine using the selected learning image data, pre-designated scene learning parameters, and pre-designated editing learning parameters.
The video input unit is a process of receiving and storing a large number of original video data.
The process of synchronizing the reference point of the original video data based on a specific sound designated in advance,
A process in which a scene analysis unit analyzes multiple scenes from the synchronized original image data using the scene learning image data.
The editing analysis department applies the editing learning video data to the artificial intelligence program using the artificial intelligence engine to select multiple scenes from among the analyzed scenes.
The above editing analysis unit includes a process of editing the selected scenes to generate edited video data.

The process of training the above artificial intelligence engine is:
A process of training the artificial intelligence engine by applying the scene learning parameters to the selected learning image data.
A process of extracting scene learning image data satisfying the scene learning parameters from the selected learning image data and storing them in a scene learning database.
A process of training the artificial intelligence engine by applying the above-mentioned editing learning parameters to the above-mentioned selected learning image data, and
A process of extracting the edited learning image data satisfying the edited learning parameters from among the selected learning image data and storing them in an edited learning database,

The above scene learning parameters include at least one of the number of speakers, object movement, screen composition, camera movement, and audio syllables.
The above editing learning parameters include at least one of scene transition, audio transition, dialogue recognition, action recognition, and background music position.
A method for automatically editing video data using artificial intelligence in a video editing device, wherein each of the above learning video data has the same length for each video type and is pre-edited.

delete