KR102523438B1

KR102523438B1 - System and method for automatic video editing using utilization of auto labeling and insertion of design elements

Info

Publication number: KR102523438B1
Application number: KR1020220163280A
Authority: KR
Inventors: 전동혁; 이우섭
Original assignee: (주)비디오몬스터
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2023-04-19
Anticipated expiration: 2042-11-29

Abstract

본 발명은 사용자가 입력한 영상(또는 이미지)에 대한 오토 라벨링을 수행하고, 라벨링된 영상에 대한 이미지 분석을 수행하여 각 영상의 감성 분류를 자동으로 수행하고, 그에 따라 적합한 배경음악을 자동으로 선정하며, 선정된 이미지감성카테고리와 배경음악에 따라 가상 템플릿을 자동 선정하고, 선정된 가상 템플릿을 통해 영상의 편집을 자동 수행하도록 함으로써, 사용자의 촬영 영상을 입력하는 것만으로 하나의 에피소드나 스토리텔링을 갖춘 라이프로그(Life Log) 또는 브이로그(Vlog)의 동영상파일을 자동으로 생성하여 제공하는데 있다.
일례로, 다수의 영상파일로부터 인식된 객체데이터에 따라 다수의 클립영상으로 분할하고, 다수의 클립영상을 그룹화하여 적어도 하나의 영상그룹을 생성하는 영상그룹 생성부; 상기 객체데이터를 기반으로 상기 클립영상에 대한 이미지감성분석을 수행하고, 이미지분석결과에 따라 영상파일 전체에 대한 이미지감성카테고리를 분류하는 이미지감성 분류부; 상기 이미지감성카테고리의 분류결과에 따른 배경음악을 선정하고, 선정된 상기 배경음악과 상기 이미지감성카테고리에 따라 미리 제작된 다수의 영상 편집용 가상 템플릿 중 하나의 가상 템플릿을 자동 선정하는 가상 템플릿 자동 선정부; 및 상기 가상 템플릿 자동 선정부를 통해 선정된 가상 템플릿의 영상 편집 방식에 따라 상기 영상그룹에 대한 자동 편집을 수행하는 영상 자동 편집 수행부를 포함하는 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 시스템을 개시한다.The present invention performs auto labeling on a video (or image) input by a user, performs image analysis on the labeled video, automatically classifies the emotion of each video, and automatically selects suitable background music accordingly. It automatically selects a virtual template according to the selected image emotion category and background music, and automatically edits the video through the selected virtual template. It is to automatically create and provide video files of Life Log or Vlog.
For example, an image group generating unit for generating at least one image group by dividing a plurality of image files into a plurality of clip images according to object data recognized from the plurality of image files and grouping the plurality of clip images; an image emotion classification unit that performs image emotion analysis on the clip image based on the object data and classifies image emotion categories for the entire image file according to the image analysis result; Background music is selected according to the classification result of the image sensibility category, and one virtual template is automatically selected among a plurality of pre-made virtual templates for video editing according to the selected background music and the image sensibility category. government; and an automatic video editing unit for automatically editing the video group according to the video editing method of the virtual template selected through the automatic virtual template selection unit. Initiate.

Description

Video automatic editing system and method using auto labeling and virtual template automatic selection

본 발명의 실시예는 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 시스템 및 그 방법에 관한 것이다.An embodiment of the present invention relates to an automatic image editing system and method through the use of auto labeling and automatic selection of a virtual template.

일반적으로 일상 생활에서 스마트 폰 등으로 촬영한 영상들을 일반 사용자가가 하나의 동영상으로 편집을 하기는 번거로우며, 약간의 전문적인 편집 도구 사용법을 알아야 함에 따른 어려움을 느낀다. In general, it is cumbersome for ordinary users to edit videos taken with a smartphone or the like in everyday life into a single video, and it is difficult to know how to use some professional editing tools.

따라서, 자동화된 영상 편집 도구가 있어 여러 곳이나 시간에 따라 촬영한 영상을 합쳐 하나의 동영상 결과물을 자동으로 편집해서 제작하는 도구가 개발되어 사용되고 있다. Therefore, there is an automated image editing tool, and a tool for automatically editing and producing a single video result by combining images taken in various places or according to time has been developed and used.

그러나, 종래의 영상 자동 편집 도구는 대부분 단순히 영상의 시간 흐름에 따라 영상물을 배열하고, 배열된 영상물을 합쳐 놓은 결과물을 제공하고 있어, 영상 결과물이 단순하고 어색한 전개를 갖도록 편집될 수 밖에 없는 문제가 있으며, 이러한 이유로 인해 사용자로 하여금 크게 흥미나 만족도를 갖지 못한다는 단점 있다.However, most of the conventional automatic video editing tools simply arrange the video objects according to the time flow of the video and provide the result of combining the arranged video objects, so the video result can only be edited to have a simple and awkward development. For this reason, there is a disadvantage in that users do not have much interest or satisfaction.

공개특허공보 제10-2020-0014487호(공개일자: 2020년02월11일)Patent Publication No. 10-2020-0014487 (published date: February 11, 2020) 등록특허공보 제10-2340963호(등록일자: 2021년12월15일)Registered Patent Publication No. 10-2340963 (registration date: December 15, 2021)

본 발명의 실시예는, 사용자가 입력한 영상(또는 이미지)에 대한 오토 라벨링을 수행하고, 라벨링된 영상에 대한 이미지 분석을 수행하여 각 영상의 감성 분류를 자동으로 수행하고, 그에 따라 적합한 배경음악을 자동으로 선정하며, 선정된 이미지감성카테고리와 배경음악에 따라 가상 템플릿을 자동 선정하고, 선정된 가상 템플릿을 통해 영상의 편집을 자동 수행하도록 함으로써, 사용자의 촬영 영상을 입력하는 것만으로 하나의 에피소드나 스토리텔링을 갖춘 라이프로그(Life Log) 또는 브이로그(Vlog)의 동영상파일을 자동으로 생성하여 제공하는 영상 자동 편집 시스템 및 그 방법을 제공한다.An embodiment of the present invention performs auto-labeling on a video (or image) input by a user, performs image analysis on the labeled video, automatically classifies the emotion of each video, and appropriate background music accordingly. automatically selects, automatically selects a virtual template according to the selected image emotion category and background music, and automatically edits the video through the selected virtual template. To provide an automatic video editing system and method for automatically generating and providing a video file of a life log or vlog with storytelling.

본 발명의 일 실시예에 따른 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 시스템은, 다수의 영상파일로부터 인식된 객체데이터에 따라 다수의 클립영상으로 분할하고, 다수의 클립영상을 그룹화하여 적어도 하나의 영상그룹을 생성하는 영상그룹 생성부; 상기 객체데이터를 기반으로 상기 클립영상에 대한 이미지감성분석을 수행하고, 이미지분석결과에 따라 영상파일 전체에 대한 이미지감성카테고리를 분류하는 이미지감성 분류부; 상기 이미지감성카테고리의 분류결과에 따른 배경음악을 선정하고, 선정된 상기 배경음악과 상기 이미지감성카테고리에 따라 미리 제작된 다수의 영상 편집용 가상 템플릿 중 하나의 가상 템플릿을 자동 선정하는 가상 템플릿 자동 선정부; 및 상기 가상 템플릿 자동 선정부를 통해 선정된 가상 템플릿의 영상 편집 방식에 따라 상기 영상그룹에 대한 자동 편집을 수행하는 영상 자동 편집 수행부를 포함한다.According to an embodiment of the present invention, the automatic video editing system through the use of auto labeling and automatic selection of virtual templates divides into a plurality of clip images according to object data recognized from a plurality of image files, and groups the plurality of clip images. an image group generator for generating at least one image group; an image emotion classification unit that performs image emotion analysis on the clip image based on the object data and classifies image emotion categories for the entire image file according to the image analysis result; Background music is selected according to the classification result of the image sensibility category, and one virtual template is automatically selected among a plurality of pre-made virtual templates for video editing according to the selected background music and the image sensibility category. government; and an automatic video editing performer for performing automatic editing on the video group according to the video editing method of the virtual template selected through the automatic virtual template selection unit.

또한, 사용자통신단말로부터 다수의 영상파일을 입력 받아 자동편집서버로 업로드 하는 영상파일 등록부; 상기 영상 자동 편집 수행부를 통해 자동 편집된 영상그룹을 서로 결합하여 동영상파일을 생성하는 동영상파일 생성부; 및 상기 동영상파일을 사용자통신단말로 전송하여 배포하는 동영상파일 배포부를 더 포함할 수 있다.In addition, a video file registration unit for receiving a plurality of video files from the user communication terminal and uploading them to the automatic editing server; a video file generation unit for generating a video file by combining the video groups automatically edited by the video automatic editing unit; and a video file distribution unit for transmitting and distributing the video file to a user communication terminal.

또한, 상기 영상파일 등록부는, 동영상파일 및 사진파일 중 적어도 하나의 영상파일을 선택 받는 영상파일 선택부; 및 상기 영상파일 선택부를 통해 선택된 영상파일을 업로드 하는 영상파일 업로드부를 포함할 수 있다.In addition, the video file registering unit may include: a video file selection unit for receiving a selection of at least one video file from among a video file and a photo file; and an image file upload unit for uploading the image file selected through the image file selection unit.

또한, 상기 영상파일 등록부를 통해 업로드 된 영상파일의 사이즈를 미리 설정된 사이즈로 각각 변환하고, 영상파일의 방향이 미리 설정된 방향으로 정렬되도록 회전시켜 영상파일에 포함된 이미지데이터를 정규화하여 상기 영상그룹 생성부로 전달하는 이미지 전처리부를 더 포함할 수 있다.In addition, the image group is generated by converting the size of each image file uploaded through the image file registration unit to a preset size, and normalizing image data included in the image file by rotating the image file so that the orientation of the image file is aligned in the preset direction. It may further include an image pre-processing unit that is transmitted to the unit.

또한, 상기 영상파일은 동영상파일 및 사진파일 중 적어도 하나를 포함하고, 상기 영상그룹 생성부는, 영상파일에서 인물객체 및 사물객체를 각각 인식하는 객체 인식부; 영상파일이 동영상파일인 경우 상기 인물객체 및 상기 사물객체에 따라 해당 영상파일을 분할하여 다수의 클립영상을 생성하는 영상파일 분할부; 클립영상 및 사진파일 별로 상기 인물객체에 대한 성별, 연령, 행동 및 감정 중 적어도 하나와 상기 사물객체를 분석하여 객체분석데이터를 생성하는 객체 분석부; 상기 객체분석데이터를 기반으로 클립영상 및 사진파일 각각 간의 맥락을 추론하고, 맥락추론결과에 따라 클립영상 및 사진파일을 자동 배열하여 영상배열을 형성하는 영상배열 형성부; 및 상기 맥락추론결과를 기반으로 상기 영상배열에서 맥락종료지점을 각각 경계점으로 자동 설정하고, 설정된 상기 경계점을 기준으로 상기 영상배열에 대한 그룹화를 수행하여 상기 영상그룹을 생성하는 영상그룹 생성부를 포함할 수 있다.In addition, the image file includes at least one of a video file and a photo file, and the image group generating unit includes: an object recognizing unit that recognizes a person object and an object object in the video file; If the video file is a video file, a video file division unit for generating a plurality of clip images by dividing the corresponding video file according to the person object and the object object; an object analyzer configured to generate object analysis data by analyzing at least one of gender, age, behavior, and emotion of the person object and the object object for each clip image and photo file; an image array forming unit for inferring the context between each of the clip image and photo files based on the object analysis data and automatically arranging the clip image and photo files according to the result of the context inference to form an image array; and an image group generator configured to automatically set context end points in the image array as boundary points based on a result of the context inference, and group the image arrays based on the set boundary points to generate the image groups. can

또한, 상기 영상배열 형성부는, 상기 영상배열에 포함된 클립영상을 재생 가능하게 표시하고, 상기 영상배열에 포함된 클립영상 및 사진파일 각각의 배열 순서를 드래그 앤 드랍 방식으로 변경하기 위한 제1 사용자 인터페이스를 제공하고, 상기 영상그룹 생성부는, 상기 맥락종료지점의 위치를 드래그 앤 드랍 방식으로 변경하기 위한 제2 사용자 인터페이스를 제공할 수 있다.In addition, the image array forming unit displays the clip images included in the image array in a reproducible manner, and changes the arrangement order of each of the clip images and photo files included in the image array by a drag-and-drop method. An interface may be provided, and the image group creation unit may provide a second user interface for changing the position of the context end point using a drag and drop method.

또한, 상기 이미지감성 분류부는, 상기 인물객체 및 상기 사물객체를 기반으로 클립영상 및 사진파일에 대한 이미지감성을 분석하고, 이미지분석결과에 따라 영상파일 전체에 대한 이미지감성카테고리를 분류할 수 있다.In addition, the image emotion classification unit may analyze the image emotion of clip images and photo files based on the person object and the object object, and classify the image emotion category of the entire image file according to the image analysis result.

또한, 상기 영상 자동 편집 수행부는, 상기 이미지감성카테고리에 따른 음원을 선택하고, 선택된 음원의 비트를 추출하는 음원 비트 추출부; 상기 음원의 비트에 대한 분위기감성을 분석하여 분위기감성분석데이터를 생성하는 분위기감성분석데이터 생성부; 상기 분위기감성분석데이터와 상기 이미지감성카테고리를 매칭하고, 매칭결과를 기반으로 상기 음원의 비트를 상기 영상그룹에 대한 배경음악으로 선정하여 상기 가상 템플릿을 통해 제공하는 배경음악 선정부; 및 상기 배경음악과 상기 이미지감성카테고리에 따라 미리 제작된 다수의 영상 편집용 가상 템플릿 중 하나의 가상 템플릿을 선정하는 가상 템플릿 선정부를 포함할 수 있다.In addition, the video automatic editing execution unit may include a sound source bit extraction unit that selects a sound source according to the image sensibility category and extracts bits of the selected sound source; an atmosphere emotion analysis data generation unit configured to generate mood emotion analysis data by analyzing the mood emotion of the beat of the sound source; a background music selector that matches the mood emotion analysis data with the image emotion category, selects a beat of the sound source as background music for the image group based on the matching result, and provides the selected beat through the virtual template; and a virtual template selector selecting one virtual template from among a plurality of virtual templates for video editing pre-manufactured according to the background music and the image emotion category.

또한, 상기 영상 자동 편집부는, 상기 가상 템플릿을 통해 상기 배경음악의 재생시간에 따른 상기 영상그룹의 편집점을 자동 설정하고, 상기 편집점에 사용자가 등록한 텍스트 삽입, 상기 이미지감성카테고리에 따른 이미지 삽입, 및 삽입된 텍스트 위치와 형태 설정을 각각 수행하여 영상그룹을 자동 편집하고, 편집된 영상그룹에 상기 배경음악을 적용할 수 있다.In addition, the video automatic editing unit automatically sets an editing point of the video group according to the playback time of the background music through the virtual template, inserts a text registered by a user into the editing point, and inserts an image according to the image emotion category. , and set the position and shape of the inserted text, respectively, to automatically edit the video group, and apply the background music to the edited video group.

본 발명의 다른 실시예에 따른 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 방법은, 영상그룹 생성부가, 다수의 영상파일로부터 인식된 객체데이터에 따라 다수의 클립영상으로 분할하고, 상기 객체데이터를 기반으로 다수의 클립영상을 그룹화하여 적어도 하나의 영상그룹을 생성하는 영상그룹 생성 단계; 이미지감성 분류부가, 상기 객체데이터를 기반으로 상기 클립영상에 대한 이미지감성분석을 수행하고, 이미지분석결과에 따라 영상파일 전체에 대한 이미지감성카테고리를 분류하는 이미지감성 분류 단계; 가상 템플릿 자동 선정부가, 상기 이미지감성의 분류결과에 따른 배경음악을 선정하고, 선정된 상기 배경음악과 상기 이미지감성카테고리에 따라 미리 제작된 다수의 영상 편집용 가상 템플릿 중 하나의 가상 템플릿을 자동 선정하는 가상 템플릿 자동 선정 단계; 및 영상 자동 편집 수행부가, 상기 가상 템플릿 자동 선정 단계를 통해 선정된 가상 템플릿의 영상 편집 방식에 따라 상기 영상그룹에 대한 자동 편집을 수행하는 영상 자동 편집 수행 단계를 포함한다.According to another embodiment of the present invention, an automatic image editing method through use of auto labeling and automatic selection of virtual templates includes an image group creation unit dividing into a plurality of clip images according to object data recognized from a plurality of image files, and the object generating at least one image group by grouping a plurality of clip images based on the data; an image emotion classification step in which an image emotion classification unit performs image emotion analysis on the clip image based on the object data and classifies image emotion categories for the entire image file according to the image analysis result; The virtual template automatic selection unit selects background music according to the classification result of the image sensibility, and automatically selects one virtual template among a plurality of pre-made virtual templates for video editing according to the selected background music and the image sensibility category. a virtual template automatic selection step; and an automatic video editing step of performing, by an automatic video editing performer, automatic editing on the video group according to the video editing method of the virtual template selected through the automatic selection of the virtual template.

또한, 영상파일 등록부가, 사용자통신단말로부터 다수의 영상파일을 입력 받아 자동편집서버로 업로드 하는 영상파일 등록 단계; 동영상파일 생성부가, 상기 영상 자동 편집 수행 단계를 통해 자동 편집된 영상그룹을 서로 결합하여 동영상파일을 생성하는 동영상파일 생성 단계; 및 동영상파일 배포부가, 상기 동영상파일을 사용자통신단말로 전송하여 배포하는 동영상파일 배포 단계를 더 포함할 수 있다.In addition, the image file registration step of receiving a plurality of image files from the user communication terminal and uploading them to the automatic editing server by the image file registration unit; a video file generation step of generating a video file by combining, by a video file creation unit, the video groups automatically edited through the automatic video editing step; and a video file distribution step of transmitting and distributing the video file to the user communication terminal by the video file distribution unit.

또한, 상기 영상파일 등록 단계는, 동영상파일 및 사진파일 중 적어도 하나의 영상파일을 선택 받는 영상파일 선택 단계; 및 상기 영상파일 선택 단계를 통해 선택된 영상파일을 업로드 하는 영상파일 업로드 단계를 포함할 수 있다.In addition, the video file registration step may include a video file selection step of receiving a selection of at least one video file from a video file and a photo file; and an image file upload step of uploading the image file selected through the image file selection step.

또한, 상기 영상파일 등록 단계를 통해 업로드 된 영상파일의 사이즈를 미리 설정된 사이즈로 각각 변환하고, 영상파일의 방향이 미리 설정된 방향으로 정렬되도록 회전시켜 영상파일에 포함된 이미지데이터를 정규화하여 상기 영상그룹 생성 단계를 실행하기 위해 전달하는 이미지 전처리 단계를 더 포함할 수 있다.In addition, the size of each video file uploaded through the video file registration step is converted to a preset size, and the image data included in the video file is normalized by rotating the image file so that the orientation of the video file is aligned in the preset direction, thereby normalizing the image group. It may further include an image pre-processing step passing to execute the generating step.

또한, 상기 영상파일은 동영상파일 및 사진파일 중 적어도 하나를 포함하고, 상기 영상그룹 생성 단계는, 영상파일에서 인물객체 및 사물객체를 각각 인식하는 객체 인식 단계; 영상파일이 동영상파일인 경우 상기 인물객체 및 상기 사물객체에 따라 해당 영상파일을 분할하여 다수의 클립영상을 생성하는 영상파일 분할 단계; 클립영상 및 사진파일 별로 상기 인물객체에 대한 성별, 연령, 행동 및 감정 중 적어도 하나와 상기 사물객체를 분석하여 객체분석데이터를 생성하는 객체 분석 단계; 상기 객체분석데이터를 기반으로 클립영상 및 사진파일 각각 간의 맥락을 추론하고, 맥락추론결과에 따라 클립영상 및 사진파일을 자동 배열하여 영상배열을 형성하는 영상배열 형성 단계; 및 상기 맥락추론결과를 기반으로 상기 영상배열에서 맥락종료지점을 각각 경계점으로 자동 설정하고, 설정된 상기 경계점을 기준으로 상기 영상배열에 대한 그룹화를 수행하여 상기 영상그룹을 생성하는 영상그룹 생성 단계를 포함할 수 있다.In addition, the image file includes at least one of a video file and a photo file, and the generating of the image group includes an object recognition step of recognizing a person object and an object object in the image file, respectively; If the video file is a video file, dividing the corresponding video file according to the person object and the object object to generate a plurality of clip images; an object analysis step of generating object analysis data by analyzing at least one of the sex, age, behavior, and emotion of the person object and the object object for each clip image and photo file; an image array forming step of inferring a context between each clip image and photo file based on the object analysis data and automatically arranging the clip image and photo file according to the result of the context inference to form an image array; and an image group generation step of automatically setting context end points in the image array as boundary points based on a result of the context inference, and generating the image groups by grouping the image arrays based on the set boundary points. can do.

또한, 상기 영상배열 형성 단계는, 상기 영상배열에 포함된 클립영상을 재생 가능하게 표시하고, 상기 영상배열에 포함된 클립영상 및 사진파일 각각의 배열 순서를 드래그 앤 드랍 방식으로 변경하기 위한 제1 사용자 인터페이스를 제공하고, 상기 영상그룹 생성 단계는, 상기 맥락종료지점의 위치를 드래그 앤 드랍 방식으로 변경하기 위한 제2 사용자 인터페이스를 제공할 수 있다.In addition, the forming of the image array may include displaying the clip images included in the image array in a reproducible manner and changing the arrangement order of each of the clip images and photo files included in the image array by a drag-and-drop method. A user interface may be provided, and in the generating of the image group, a second user interface may be provided to change the location of the context end point using a drag and drop method.

또한, 상기 이미지감성 분류 단계는, 상기 인물객체 및 상기 사물객체를 기반으로 클립영상 및 사진파일에 대한 이미지감성을 분석하고, 이미지분석결과에 따라 영상파일 전체에 대한 이미지감성카테고리를 분류할 수 있다.In addition, the image emotion classification step may analyze the image emotion for clip images and photo files based on the person object and the object object, and classify the image emotion category for the entire video file according to the image analysis result. .

또한, 상기 가상 템플릿 자동 선정 단계는, 상기 이미지감성카테고리에 따른 음원을 선택하고, 선택된 음원의 비트를 추출하는 음원 비트 추출 단계; 상기 음원의 비트에 대한 분위기감성을 분석하여 분위기감성분석데이터를 생성하는 분위기감성분석데이터 생성 단계; 상기 분위기감성분석데이터와 상기 이미지감성카테고리를 매칭하고, 매칭결과를 기반으로 상기 음원의 비트를 상기 영상그룹에 대한 배경음악으로 선정하여 상기 가상 템플릿을 통해 제공하는 배경음악 선정 단계; 및 상기 배경음악과 상기 이미지감성카테고리에 따라 미리 제작된 다수의 영상 편집용 가상 템플릿 중 하나의 가상 템플릿을 선정하는 가상 템플릿 선정부를 포함할 수 있다.In addition, the step of automatically selecting the virtual template may include a sound source bit extraction step of selecting a sound source according to the image sensibility category and extracting bits of the selected sound source; an atmosphere emotion analysis data generating step of generating atmosphere emotion analysis data by analyzing the mood emotion of the beat of the sound source; a background music selection step of matching the mood emotion analysis data with the image emotion category, selecting a beat of the sound source as background music for the image group based on a matching result, and providing the background music through the virtual template; and a virtual template selector selecting one virtual template from among a plurality of virtual templates for video editing pre-manufactured according to the background music and the image emotion category.

또한, 상기 영상 자동 편집 수행 단계는, 상기 가상 템플릿을 통해 상기 배경음악의 재생시간에 따른 상기 영상그룹의 편집점을 자동 설정하고, 상기 편집점에 사용자가 등록한 텍스트 삽입, 상기 이미지감성카테고리에 따른 이미지 삽입, 및 삽입된 텍스트 위치와 형태 설정을 각각 수행하여 영상그룹을 자동 편집하고, 편집된 영상그룹에 상기 배경음악을 적용할 수 있다.In addition, in the performing of automatic video editing, editing points of the video group are automatically set according to the playback time of the background music through the virtual template, text registered by the user is inserted into the editing points, The video group may be automatically edited by inserting an image and setting the position and shape of the inserted text, respectively, and the background music may be applied to the edited video group.

본 발명에 따르면, 사용자가 입력한 영상(또는 이미지)에 대한 오토 라벨링을 수행하고, 라벨링된 영상에 대한 이미지 분석을 수행하여 각 영상의 감성 분류를 자동으로 수행하고, 그에 따라 적합한 배경음악을 자동으로 선정하며, 선정된 이미지감성카테고리와 배경음악에 따라 가상 템플릿을 자동 선정하고, 선정된 가상 템플릿을 통해 영상의 편집을 자동 수행하도록 함으로써, 사용자의 촬영 영상을 입력하는 것만으로 하나의 에피소드나 스토리텔링을 갖춘 라이프로그(Life Log) 또는 브이로그(Vlog)의 동영상파일을 자동으로 생성하여 제공하는 영상 자동 편집 시스템 및 그 방법을 제공할 수 있다.According to the present invention, auto-labeling is performed on a video (or image) input by a user, image analysis is performed on the labeled video, emotion classification of each video is automatically performed, and appropriate background music is automatically selected accordingly. , and automatically selects a virtual template according to the selected image emotion category and background music, and automatically edits the video through the selected virtual template. It is possible to provide an automatic video editing system and method for automatically generating and providing a video file of a life log or Vlog equipped with telling.

도 1은 본 발명의 일 실시예에 따른 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 시스템의 구성 형태를 설명하기 위해 나타낸 개요도이다.
도 2는 본 발명의 일 실시예에 따른 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 시스템의 구성을 나타낸 블록도이다.
도 3은 본 발명의 일 실시예에 따른 영상파일 등록부의 구성을 나타낸 블록도이다.
도 4는 본 발명의 일 실시예에 따른 영상그룹 생성부의 구성을 나타낸 블록도이다.
도 5는 본 발명의 일 실시예에 따른 이미지감성 분류부의 구성을 나타낸 블록도이다.
도 6은 본 발명의 일 실시예에 따른 가상 템플릿 자동 선정부의 구성을 나타낸 블록도이다.
도 7은 본 발명의 일 실시예에 따른 영상 자동 편집 과정을 설명하기 위해 나타낸 도면이다.
도 8은 본 발명의 일 실시예에 따른 가상 템플릿 제작 과정을 설명하기 위해 나타낸 도면이다.
도 9는 본 발명의 다른 실시예에 따른 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 방법의 구성을 나타낸 순서도이다.
도 10은 본 발명의 다른 실시예에 따른 영상파일 등록 단계의 구성을 나타낸 순서도이다.
도 11은 본 발명의 다른 실시예에 따른 영상그룹 생성 단계의 구성을 나타낸 순서도이다.
도 12는 본 발명의 다른 실시예에 따른 이미지감성 분류 단계의 구성을 나타낸 순서도이다.
도 13은 본 발명의 다른 실시예에 따른 가상 템플릿 자동 선정 단계의 구성을 나타낸 순서도이다.1 is a schematic diagram illustrating the configuration of an automatic video editing system through automatic labeling utilization and automatic virtual template selection according to an embodiment of the present invention.
2 is a block diagram showing the configuration of an automatic video editing system through the use of auto labeling and automatic selection of a virtual template according to an embodiment of the present invention.
3 is a block diagram showing the configuration of a video file registration unit according to an embodiment of the present invention.
4 is a block diagram showing the configuration of a video group generator according to an embodiment of the present invention.
5 is a block diagram showing the configuration of an image sensibility classification unit according to an embodiment of the present invention.
6 is a block diagram showing the configuration of a virtual template automatic selection unit according to an embodiment of the present invention.
7 is a diagram for explaining an automatic video editing process according to an embodiment of the present invention.
8 is a diagram for explaining a virtual template manufacturing process according to an embodiment of the present invention.
9 is a flowchart illustrating the configuration of a method for automatically editing an image through utilization of auto labeling and automatic selection of a virtual template according to another embodiment of the present invention.
10 is a flowchart showing the configuration of a video file registration step according to another embodiment of the present invention.
11 is a flowchart illustrating the configuration of a video group generating step according to another embodiment of the present invention.
12 is a flowchart showing the configuration of an image sensibility classification step according to another embodiment of the present invention.
13 is a flowchart showing the configuration of the step of automatically selecting a virtual template according to another embodiment of the present invention.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 본 발명에 대해 구체적으로 설명하기로 한다.The terms used in this specification will be briefly described, and the present invention will be described in detail.

본 발명에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in the present invention have been selected from general terms that are currently widely used as much as possible while considering the functions in the present invention, but these may vary depending on the intention of a person skilled in the art or precedent, the emergence of new technologies, and the like. In addition, in a specific case, there is also a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the invention. Therefore, the term used in the present invention should be defined based on the meaning of the term and the overall content of the present invention, not simply the name of the term.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나 이상의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.When it is said that a certain part "includes" a certain component throughout the specification, it means that it may further include other components without excluding other components unless otherwise stated. In addition, terms such as "...unit" and "module" described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software. .

아래에서는 첨부한 도면을 참고하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily carry out the present invention. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

도 1은 본 발명의 일 실시예에 따른 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 시스템의 구성 형태를 설명하기 위해 나타낸 개요도이다.1 is a schematic diagram illustrating the configuration of an automatic video editing system through automatic labeling utilization and automatic virtual template selection according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 시스템(1000)은 사용자통신단말(10)과 자동편집서버(20)를 이용하여 구현될 수 있다. Referring to FIG. 1 , an automatic video editing system 1000 using auto labeling and automatically selecting a virtual template according to an embodiment of the present invention is implemented using a user communication terminal 10 and an automatic editing server 20. can

본 발명의 일 실시예에 따른 사용자통신단말(10)은 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집을 위한 일부 기능이 탑재된 소프트웨어를 통해 영상 자동 편집 서비스를 제공 받을 수 있다. The user communication terminal 10 according to an embodiment of the present invention can receive an automatic video editing service through software equipped with some functions for automatic video editing through the use of auto labeling and automatic selection of virtual templates.

좀 더 구체적으로, 사용자통신단말(10)은, 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 서비스를 제공 받기 위한 전용 프로그램(예를 들어, 어플리케이션 관리 프로그램)이 설치 또는 탑재되거나, 사용자통신단말(10)의 웹 브라우저를 통해 웹 사이트에 접속하는 방식을 통해 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 서비스를 제공받을 수 있도록 구현될 수 있다. 이러한 사용자통신단말(10)은, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(smartphone), 스마트 패드(smartpad), 타블렛 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다. 여기서, 웹 브라우저는 웹(WWW: world wide web) 서비스를 이용할 수 있게 하는 프로그램으로 HTML(hypertext mark-up language)로 서술된 하이퍼텍스트를 받아서 보여주는 프로그램을 의미하며, 예를 들어 넷스케이프(Netscape), 익스플로러(Explorer), 크롬(chrome) 등을 포함한다. 또한, 애플리케이션은 단말 상의 응용 프로그램(application)을 의미하며, 즉 모바일 단말(스마트폰)에서 실행되는 어플리케이션을 포함할 수 있다.More specifically, in the user communication terminal 10, a dedicated program (for example, an application management program) for receiving an automatic video editing service through the use of auto labeling and automatic selection of a virtual template is installed or loaded, or the user communication terminal 10 Through a method of accessing a website through a web browser of the terminal 10, it may be implemented to receive an automatic video editing service through the use of auto labeling and automatic selection of a virtual template. These user communication terminals 10 are PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular), PHS (Personal Handyphone System), PDA (Personal Digital Assistant), IMT (International Mobile) Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), Wibro (Wireless Broadband Internet) terminal, smartphone, smart pad, tablet PC ( It may include all kinds of handheld-based wireless communication devices such as a Tablet PC. Here, the web browser is a program that enables the use of web (WWW: world wide web) services, and means a program that receives and displays hypertext described in HTML (hypertext mark-up language). For example, Netscape, Includes Explorer, Chrome, etc. In addition, an application means an application on a terminal, that is, it may include an application executed on a mobile terminal (smart phone).

상기 자동편집서버(20)는, 다수의 사용자통신단말(10)과 연결되어, 사용자통신단말(10)로부터 선택된 다수의 영상파일(동영상, 사진)를 수신하고, 수신된 영상파일의 메타데이터에 포함된 일시 및 위치정보와 영상데이터로부터의 인식객체를 기반으로 영상에 대한 라벨링을 자동 수행하고, 머신러닝 기술을 활용한 영상의 맥락 추정을 통해 라벨링된 영상의 자동 배열, 그룹화 및 경계설정을 수행하고, 경계 지점에서의 트랜지션을 자동 삽입함으로써 하나의 에피소드나 스토리텔링을 갖춘 동영상파일을 제공할 수 있다.The automatic editing server 20 is connected to a plurality of user communication terminals 10, receives a plurality of video files (videos, photos) selected from the user communication terminals 10, and stores metadata of the received video files. Automatically label images based on included time and location information and recognition objects from image data, and perform automatic arrangement, grouping, and boundary setting of labeled images through context estimation of images using machine learning technology And, by automatically inserting transitions at boundary points, a video file with one episode or storytelling can be provided.

상기 자동편집서버(20)는, 하드웨어적으로 통상적인 웹 서버와 동일한 구성을 가지며, 소프트웨어적으로는 C, C++, Java, Visual Basic, Visual C 등과 같은 다양한 형태의 언어를 통해 구현되어 여러 가지 기능을 하는 프로그램 모듈을 포함할 수 있다. 또한, 일반적인 서버용 하드웨어에 도스(dos), 윈도우(window), 리눅스(linux), 유닉스(unix), 매킨토시(macintosh), 안드로이드(Android), 아이오에서(iOS) 등의 운영 체제에 따라 다양하게 제공되고 있는 웹 서버 프로그램을 이용하여 구현될 수 있다.The automatic editing server 20 has the same configuration as a normal web server in terms of hardware, and in terms of software, various functions are implemented through various types of languages such as C, C++, Java, Visual Basic, Visual C, etc. It may include a program module that does. In addition, it is provided in various ways depending on the operating system such as DOS, Windows, Linux, Unix, Macintosh, Android, and iOS in general server hardware. It can be implemented using a web server program that is being developed.

한편, 사용자통신단말(10)과 자동편집서버(20) 간을 연결하는 인터넷 네트워크의 통신망의 일 예로는, 이동통신을 위한 기술표준들 또는 통신방식(예를 들어, GSM(Global System for Mobile communication), CDMA(Code Division Multi Access), CDMA2000(Code Division Multi Access 2000), EV-DO(Enhanced Voice-Data Optimized or Enhanced Voice-Data Only), WCDMA(Wideband CDMA), HSDPA(High Speed Downlink Packet Access), HSUPA(High Speed Uplink Packet Access), LTE(Long Term Evolution), LTE-A(Long Term Evolution-Advanced), 5G 등)에 따라 구축된 이동 통신망을 포함할 수 있으나, 특별히 한정하는 것은 아니다. 또한, 유선 통신망의 일 예로는, LAN(Local Area Network), WAN(Wide Area Network)등의 폐쇄형 네트워크일 수 있으며, 인터넷과 같은 개방형 네트워크인 것이 바람직하다. 인터넷은 TCP/IP 프로토콜 및 그 상위계층에 존재하는 여러 서비스, 즉 HTTP(HyperText Transfer Protocol), Telnet, FTP(File Transfer Protocol), DNS(Domain Name System), SMTP(Simple Mail Transfer Protocol), SNMP(Simple Network Management Protocol), NFS(Network File Service), NIS(Network Information Service)를 제공하는 전세계적인 개방형 컴퓨터 네트워크 구조를 의미한다.On the other hand, as an example of the communication network of the Internet network connecting the user communication terminal 10 and the automatic editing server 20, technical standards or communication methods for mobile communication (eg, GSM (Global System for Mobile communication) ), Code Division Multi Access (CDMA), Code Division Multi Access 2000 (CDMA2000), Enhanced Voice-Data Optimized or Enhanced Voice-Data Only (EV-DO), Wideband CDMA (WCDMA), High Speed Downlink Packet Access (HSDPA) , High Speed Uplink Packet Access (HSUPA), Long Term Evolution (LTE), Long Term Evolution-Advanced (LTE-A), 5G, etc.), but is not particularly limited. In addition, an example of the wired communication network may be a closed network such as a local area network (LAN) and a wide area network (WAN), preferably an open network such as the Internet. The Internet is based on the TCP/IP protocol and several services that exist on its upper layer, such as HTTP (HyperText Transfer Protocol), Telnet, FTP (File Transfer Protocol), DNS (Domain Name System), SMTP (Simple Mail Transfer Protocol), SNMP ( Simple Network Management Protocol), Network File Service (NFS), and Network Information Service (NIS).

도 2는 본 발명의 일 실시예에 따른 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 시스템의 구성을 나타낸 블록도이고, 도 3은 본 발명의 일 실시예에 따른 영상파일 등록부의 구성을 나타낸 블록도이고, 도 4는 본 발명의 일 실시예에 따른 영상그룹 생성부의 구성을 나타낸 블록도이고, 도 5는 본 발명의 일 실시예에 따른 이미지감성 분류부의 구성을 나타낸 블록도이고, 도 6은 본 발명의 일 실시예에 따른 가상 템플릿 자동 선정부의 구성을 나타낸 블록도이고, 도 7은 본 발명의 실시예에 따른 영상 자동 편집 과정을 설명하기 위해 나타낸 도면이며, 도 8은 본 발명의 일 실시예에 따른 가상 템플릿 제작 과정을 설명하기 위해 나타낸 도면이다.2 is a block diagram showing the configuration of an automatic video editing system through the use of auto labeling and automatic selection of virtual templates according to an embodiment of the present invention, and FIG. 3 shows the configuration of a video file registration unit according to an embodiment of the present invention. FIG. 4 is a block diagram showing the configuration of an image group generator according to an embodiment of the present invention, and FIG. 5 is a block diagram showing the configuration of an image emotion classifier according to an embodiment of the present invention. 6 is a block diagram showing the configuration of an automatic virtual template selection unit according to an embodiment of the present invention, FIG. 7 is a diagram for explaining an automatic image editing process according to an embodiment of the present invention, and FIG. It is a diagram shown to explain a virtual template manufacturing process according to an embodiment.

도 2를 참조하면, 본 발명의 일 실시예에 따른 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 시스템(1000)은 영상파일 등록부(100), 이미지 전처리부(200), 영상그룹 생성부(300), 이미지감성 분류부(400), 가상 템플릿 자동 선정부(500), 영상 자동 편집 수행부(600), 동영상파일 생성부(700) 및 동영상파일 배포부(800) 중 적어도 하나를 포함할 수 있다.Referring to FIG. 2 , an automatic video editing system 1000 using auto labeling and automatically selecting a virtual template according to an embodiment of the present invention includes an image file registration unit 100, an image preprocessing unit 200, and a video group generation unit. 300, an image emotion classification unit 400, a virtual template automatic selection unit 500, an automatic image editing unit 600, a video file generation unit 700, and a video file distribution unit 800. can do.

상기 영상파일 등록부(100)는, 사용자통신단말(10)로부터 다수의 영상파일을 입력 받아 자동편집서버(20)로 업로드 할 수 있다.The video file registration unit 100 may receive a plurality of video files from the user communication terminal 10 and upload them to the automatic editing server 20 .

이를 위해 영상파일 등록부(100)는 도 3에 도시된 바와 같이, 영상파일 선택부(110)와 영상파일 업로드부(120) 중 적어도 하나를 포함할 수 있다.To this end, the image file registration unit 100 may include at least one of an image file selection unit 110 and an image file upload unit 120 as shown in FIG. 3 .

상기 영상파일 선택부(110)는, 사용자통신단말(10)의 앨범 또는 사진첩에 접근하여, 해당 앨범 또는 사진첩에 저장된 동영상파일 및 사진파일 중 적어도 하나의 영상파일을 선택 받도록 하거나, 사용자통신단말(10)의 카메라를 통해 촬영된 동영상이나 사진 중 어느 하나를 선택할 수 있도록 한다. The video file selector 110 accesses the album or photo album of the user communication terminal 10 and allows selection of at least one video file from among video files and photo files stored in the album or photo album, or the user communication terminal ( 10) allows users to select either a video or a photo taken by the camera.

상기 영상파일 업로드부(120)는, 영상파일 선택부(110)를 통해 선택된 동영상, 사진 등의 영상파일을 유선 또는 무선 인터넷 통신망을 통해 자동편집서버(20)로 업로드 할 수 있다. The video file upload unit 120 may upload video files such as videos and photos selected through the video file selection unit 110 to the automatic editing server 20 through a wired or wireless internet communication network.

상기 이미지 전처리부(200)는, 영상파일 등록부(100)를 통해 업로드 된 영상파일(동영상, 사진)의 사이즈를 미리 설정된 사이즈로 통일되도록 각각 변환하고, 영상파일의 방향이 미리 설정된 방향으로 정렬되도록 회전시켜 영상파일에 포함된 이미지데이터를 정규화할 수 있으며, 전처리가 완료된 영상파일을 영상그룹 생성부(300)로 전달할 수 있다. 이때, 영상파일은 그 촬영방향 즉 세로로 촬영되었는지 또는 가로로 촬영되었는지에 따라 해당 영상의 사이즈와 형태가 상이할 수 있으므로, 이를 통일시키도록 영상을 회전 변환시킬 수 있다.The image pre-processing unit 200 converts the sizes of video files (videos and photos) uploaded through the video file registering unit 100 to be unified to a preset size, and arranges the directions of the video files in a preset direction. Image data included in the image file may be normalized by rotation, and the image file for which preprocessing is completed may be transmitted to the image group generator 300 . At this time, since the size and shape of the corresponding image may be different depending on the image file's shooting direction, that is, vertical or horizontal, the images may be rotated and converted to unify them.

상기 영상그룹 생성부(300)는, 다수의 영상파일(동영상, 사진)로부터 인식된 객체데이터에 따라 다수의 클립영상으로 분할하고, 인식된 객체데이터를 기반으로 다수의 클립영상 간 맥락을 추론하고, 맥락추론결과에 따라 다수의 클립영상을 배열 및 그룹화하여 다수의 클립영상을 포함한 적어도 하나의 영상그룹을 생성할 수 있다.The image group generator 300 divides a plurality of image files (videos and photos) into a plurality of clip images according to recognized object data, infers context between the plurality of clip images based on the recognized object data, and , At least one image group including a plurality of clip images may be created by arranging and grouping a plurality of clip images according to the result of context inference.

이를 위해 영상그룹 생성부(300)는 도 4에 도시된 바와 같이, 객체 인식부(310), 영상파일 분할부(320), 객체 분석부(330), 영상배열 형성부(340) 및 영상그룹 생성부(350) 중 적어도 하나를 포함할 수 있다.To this end, as shown in FIG. 4, the image group creation unit 300 includes an object recognition unit 310, an image file division unit 320, an object analysis unit 330, an image array formation unit 340, and an image group At least one of the generating unit 350 may be included.

상기 객체 인식부(310)는, 전처리가 완료된 영상파일(동영상, 사진)에서 인물객체 및 사물객체를 각각 인식할 수 있다. 이러한 객체 인식부(310)는 객체 인식을 위해 미리 정의된 머신러닝 알고리즘을 활용하여 영상파일의 영상데이터 내 존재하는 특정 객체를 인식할 수 있다. 여기서, 객체는 사람(인물), 동물(개, 고양이 등), 사물(자동차, 건축물, 교량, 신호등 등) 등 다양한 대상을 포함할 수 있으며, 본 실시예에서는 미리 설정된 객체 또는 객체군에 대한 인식 프로세스를 제공할 수 있다. The object recognizing unit 310 may recognize a person object and an object object respectively from a preprocessed image file (video, photo). The object recognition unit 310 may recognize a specific object existing in image data of an image file by utilizing a predefined machine learning algorithm for object recognition. Here, the object may include various objects such as people (persons), animals (dogs, cats, etc.), objects (cars, buildings, bridges, traffic lights, etc.), and in this embodiment, recognition of a preset object or group of objects. process can be provided.

상기 영상파일 분할부(320)는, 업로드 된 영상파일이 동영상파일인 경우 해당 동영상파일에서 인식된 객체데이터(인물객체, 사물객체)에 따라 해당 동영상파일을 해당 객체를 기준으로 분할하여 다수의 클립영상을 생성할 수 있다.When the uploaded video file is a video file, the video file division unit 320 divides the video file based on the object according to the object data (person object, object object) recognized in the video file, and divides the video file into a plurality of clips. You can create video.

예를 들어, 도 7의 (a) 및 (b)에 도시된 바와 같이 업로드 된 영상파일이 Video01, Video02, Image01, Video03, Image02, Video04가 있다고 가정했을 때, Video01에는 객체 1, 2가 인식되고, Video02에는 객체 1, 2가 인식되고, Video03에는 객체 2, 3이 인식되고, Video04에는 객체 2, 3이 인식되었다면, 각 인식객체가 나타나는 재생구간에 따라 Video를 다수의 클립영상으로 분할할 수 있다. 다만, 하나의 Video에서 서로 다른 객체가 서로 다른 재생구간에서 인식되는 경우 객체가 나타나는 구간 단위로 Video를 분할할 수 있다. 즉, Video01에는 객체 1, 2가 인식되었지만, 객체 1이 먼저 인식되고 객체 2가 인식된 후 다시 객체 1이 인식되는 경우 객체 1, 객체 2, 객체 3의 인식순서에 따라 Clip01, Clip02, Clip03으로 분할할 수 있다. For example, assuming that the uploaded video files have Video01, Video02, Image01, Video03, Image02, and Video04 as shown in (a) and (b) of FIG. 7, objects 1 and 2 are recognized in Video01 and , If objects 1 and 2 are recognized in Video02, objects 2 and 3 are recognized in Video03, and objects 2 and 3 are recognized in Video04, the video can be divided into multiple clips according to the playback section in which each recognized object appears. there is. However, if different objects are recognized in different playback sections in one video, the video can be divided into sections where the objects appear. That is, if objects 1 and 2 are recognized in Video01, but object 1 is recognized first, then object 2 is recognized, and then object 1 is recognized again, Clip01, Clip02, and Clip03 according to the recognition order of object 1, object 2, and object 3. can be divided

상기 객체 분석부(330)는, 클립영상 및 사진파일 별로 인물객체의 성별, 연령, 행동 및 감정 중 적어도 하나와 사물객체에 대한 각각의 특징을 분석하여 객체분석데이터를 생성할 수 있다. 여기서, 객체분석데이터는 후술하는 맥락추론을 위한 기초정보로서 활용되며, 클립영상 및 사진들에 대한 전후 맥락을 추론하기에 앞서 각 파일들에 나타나는 인물객체의 특징적 요소 즉, 성별, 연령, 행동, 감정 등의 특징 사물객체에 대한 특징적 요소를 각각 미리 정의된 머신러닝 알고리즘을 활용하여 분석할 수 있으며, 각각의 분석결과들은 후술하는 맥락추론을 위해 기초정보 중 하나로서 활용될 수 있다.The object analyzer 330 may generate object analysis data by analyzing at least one of the sex, age, behavior, and emotion of the person object and each characteristic of the object object for each clip image and photo file. Here, the object analysis data is used as basic information for context inference, which will be described later, and the characteristic elements of the person object appearing in each file, that is, gender, age, behavior, Each characteristic element of a characteristic object, such as emotion, can be analyzed using a predefined machine learning algorithm, and each analysis result can be used as one of the basic information for context inference described later.

상기 영상배열 형성부(340)는, 영상파일(클립영상 및 사진파일)에 대한 객체분석데이터를 기반으로 클립영상 및 사진파일 각각 간의 맥락을 추론하고, 맥락추론결과에 따라 클립영상 및 사진파일을 자동 배열 또는 정렬하여 영상배열을 형성할 수 있다.The image array forming unit 340 infers the context between each clip image and photo file based on the object analysis data for the image file (clip image and photo file), and generates clip image and photo files according to the context inference result. Image alignment can be formed by automatic alignment or alignment.

예를 들어, 도 7의 (b)와 같이 분할된 Clip01, Clip02, Clip03, Clip04, Clip05, Image01, Clip06, Clip07, Clip08, Image02, Clip09, Clip10은 각 클립영상과 사진파일의 촬영시간, 촬영위치, 객체의 다양한 특성(성별, 연령, 행동, 감정)을 고려하여 미리 학습된 머신러닝 알고리즘을 통한 맥락추론을 실시하여 특정한 스토리나 전개를 갖는 파일들의 순서를 정의 또는 결정할 수 있으며, 이러한 맥락추론결과에 따라 도 7의 (c)에 도시된 바와 같이, Clip02, Clip04, Clip03, Image01, Clip01, Clip05, Clip06, Clip09, Image02, Clip08, Clip10, Clip07의 순서를 갖는 영상배열을 형성할 수 있다. For example, Clip01, Clip02, Clip03, Clip04, Clip05, Image01, Clip06, Clip07, Clip08, Image02, Clip09, and Clip10 divided as shown in FIG. , It is possible to define or determine the order of files with a specific story or development by conducting context inference through pre-learned machine learning algorithms in consideration of various characteristics (gender, age, behavior, emotion) of objects, and the result of such context inference Accordingly, as shown in (c) of FIG. 7 , an image array having the order of Clip02, Clip04, Clip03, Image01, Clip01, Clip05, Clip06, Clip09, Image02, Clip08, Clip10, and Clip07 may be formed.

한편, 영상배열 형성부(340)는, 영상배열에 포함된 클립영상을 재생 가능하게 표시하고, 영상배열에 포함된 클립영상 및 사진파일 각각의 배열 순서를 드래그 앤 드랍 방식으로 변경하기 위한 제1 사용자 인터페이스를 제공할 수 있다.On the other hand, the image array forming unit 340 displays the clip images included in the image array in a reproducible manner and changes the arrangement order of each of the clip images and photo files included in the image array in a drag-and-drop manner. A user interface can be provided.

예를 들어, 도 7의 (c)와 같이 도시된 영상배열이 Clip02, Clip04, Clip03, Image01, Clip01, Clip05, Clip06, Clip09, Image02, Clip08, Clip10, Clip07의 순서로 형성된 경우, Clip02 클립영상을 선택한 후 드래그하여 Clip01과 Clip05 사이에 드랍하고, Image02를 선택한 후 Clip04 앞으로 드랍하면, 해당 영상배열이 Image02, Clip04, Clip03, Image01, Clip01, Clip02, Clip05, Clip06, Clip09, Clip08, Clip10, Clip07의 순서로 재배치될 수 있다.For example, when the image array shown in (c) of FIG. 7 is formed in the order of Clip02, Clip04, Clip03, Image01, Clip01, Clip05, Clip06, Clip09, Image02, Clip08, Clip10, Clip07, Clip02 clip image After selecting and dragging to drop between Clip01 and Clip05, select Image02 and drop in front of Clip04, the image array will be in the order of Image02, Clip04, Clip03, Image01, Clip01, Clip02, Clip05, Clip06, Clip09, Clip08, Clip10, Clip07 can be relocated to

상기 영상그룹 생성부(350)는, 영상배열 형성부(340)를 통한 맥락추론결과를 기반으로 영상배열에서 맥락종료지점을 각각 경계점으로 자동 설정하고, 설정된 경계점을 기준으로 영상배열에 대한 그룹화를 수행하여 영상그룹을 생성할 수 있다.The image group creation unit 350 automatically sets each context end point in the image array as a boundary point based on the result of context inference through the image array forming unit 340, and groups the image arrays based on the set boundary point. You can create an image group by performing

예를 들어, 도 7의 (c)에 도시된 바와 같이 맥락추론결과에 따라 특정한 스토리의 흐름이 종료되는 지점으로 A1와 A2가 설정되면, 설정된 A1와 A2를 기준으로 Clip02, Clip04, Clip03, Image01으로 구성되는 제1 영상그룹, Clip01, Clip05, Clip06, Clip09로 구성되는 제2 영상그룹, Image02, Clip08, Clip10, Clip07로 구성되는 제3 영상그룹이 각각 정의될 수 있다. For example, as shown in (c) of FIG. 7 , when A1 and A2 are set as points at which the flow of a specific story ends according to the result of context inference, Clip02, Clip04, Clip03, Image01 based on the set A1 and A2. A first image group composed of , a second image group composed of Clip01, Clip05, Clip06, and Clip09, and a third image group composed of Image02, Clip08, Clip10, and Clip07 may be respectively defined.

한편, 영상그룹 생성부(350)는, 맥락종료지점 즉 경계점의 위치를 드래그 앤 드랍 방식으로 변경하기 위한 제2 사용자 인터페이스를 제공할 수 있다.Meanwhile, the image group creation unit 350 may provide a second user interface for changing the location of a context end point, that is, a border point, using a drag and drop method.

예를 들어, 도 7의 (c)에 도시된 경계점 A1과 A2가 최초 설정되어 있는 상태에서, A1을 선택한 후 드래그하여 Clip01과 Clip05 사이에 드랍하면 제1 영상그룹은 Clip02, Clip04, Clip03, Image01, Clip01으로 재구성되고, 제2 영상그룹은 Clip05, Clip06, Clip09로 재구성될 수 있다.For example, in the state where the boundary points A1 and A2 shown in (c) of FIG. 7 are initially set, if A1 is selected and then dragged and dropped between Clip01 and Clip05, the first image group is Clip02, Clip04, Clip03, Image01 , Clip01, and the second image group can be reconstructed into Clip05, Clip06, and Clip09.

상기 이미지감성 분류부(400)는, 영상그룹 생성부(300)를 통해 생성된 클립영상 별로 객체데이터에 대한 이미지분석을 각각 수행하고, 각 이미지분석결과에 따른 이미지감성을 각각 분류할 수 있다. The image emotion classification unit 400 may perform image analysis on object data for each clip image generated through the image group generation unit 300 and classify image emotions according to each image analysis result.

이를 위해 이미지감성 분류부(400)는 도 5에 도시된 바와 같이 인물객체와 사물객체를 기반으로 미리 정의된 머신러닝 알고리즘을 활용하여 클립영상 및 사진파일에 대한 이미지감성을 분석하고, 그 분석결과에 따라 사용자가 등록한 영상파일 전체에 대한 이미지감성카테고리를 분류할 수 있다.To this end, the image emotion classification unit 400 analyzes the image emotion of clip images and photo files by using a predefined machine learning algorithm based on the person object and the object object, as shown in FIG. 5, and the result of the analysis According to this, the image sensibility category for all image files registered by the user can be classified.

예를 들어, 인물객체의 성별이나 연령에 따른 전반적인 이미지의 분위기를 카테고리 별로 분류할 수 있으며, 좀 더 세부적으로는 인물객체의 표정을 즐거움, 분노, 슬픔 등으로 정의한 후 해당 인물의 표정이 어느 카테고리에 속하는지를 판단하여 인물객체에 대한 표정분석데이터를 생성하고, 인물객체의 행동을 편안함, 격함 등으로 정의한 후 해당 인물의 행동이 어느 카테고리에 속하는지를 판단하여 인물객체에 대한 행동분석데이터를 생성할 수 있다. 이렇게 생성된 표정분석데이터, 행동분석데이터 및 사물객체메타데이터를 종합하여 해당 클립영상 또는 사진파일의 이미지에 대한 감성을 추정하여 판단할 수 있다. For example, the mood of the overall image according to the gender or age of the person object can be classified by category, and more specifically, after defining the expression of the person object as joy, anger, sadness, etc., which category is the expression of the person in question? After determining whether it belongs to a character object, facial expression analysis data for the person object is created, and after defining the person object's behavior as comfort, fierceness, etc., it is determined which category the person's behavior belongs to to generate the behavior analysis data for the person object. can By integrating the facial expression analysis data, behavior analysis data, and object object metadata generated in this way, it is possible to estimate and determine the emotion of the image of the corresponding clip image or photo file.

예를 들어, 미리 정의된 머신러닝 알고리즘을 활용하여 입력 값인 표정분석데이터, 행동분석데이터 및 사물객체메타데이터에 대하여 '여행지 리뷰', '즐거움 또는 행복함'이라는 내용의 이미지감성카테고리가 결과 값으로 출력될 수 있다. 이러한 이미지감성카테고리 값에 따라 해당 영상들은 미리 정의된 다수의 이미지감성 카테고리 중 '행복하고 즐거운 여행지 리뷰'라는 카테고리의 이미지감성데이터 값으로 분류되어 클립영상 및 사진파일에 자동으로 라벨링 될 수 있다.For example, by utilizing a predefined machine learning algorithm, image emotion categories such as 'review of travel destinations' and 'joy or happiness' are output as result values for facial expression analysis data, behavior analysis data, and object object metadata, which are input values. can be output. According to these image emotion category values, the corresponding videos are classified as image emotion data values of the category of 'happy and pleasant travel destination review' among a plurality of predefined image emotion categories, and can be automatically labeled as clip images and photo files.

더불어, 인식된 사물객체의 종류 즉, 영상의 배경을 구성하는 사물의 종류에 따른 이미지 감성이나 분위기를 카테고리 별로 분류할 수 있다. In addition, the image sensibility or atmosphere according to the type of the recognized object, that is, the type of object constituting the background of the image, may be classified for each category.

이러한 이미지감성 분류부(400)는 인물객체와 사물객체에 대한 각각의 이미지감성분석결과를 토대로 평균적인 이미지감성분석결과를 도출하고, 그에 따른 이미지감성카테고리로 최종 분류할 수 있다.The image emotion classification unit 400 may derive an average image emotion analysis result based on each image emotion analysis result for the person object and the object object, and finally classify the result into an image emotion category.

상기 가상 템플릿 자동 선정부(500)는, 이미지감성 분류부(400)를 통해 분류된 이미지감성카테고리에 따른 배경음악을 자동 선정하고, 선정된 배경음악과 이미지감성카테고리에 따라 미리 제작된 다수의 영상 편집용 가상 템플릿 중 하나의 가상 템플릿을 자동으로 선정할 수 있다. The virtual template automatic selection unit 500 automatically selects background music according to the image sensibility category classified through the image sensibility classification unit 400, and a plurality of pre-produced images according to the selected background music and image sensibility category. One of the virtual templates for editing may be automatically selected.

본 실시예에 따른 가상 템플릿은 영상 자동 편집을 위한 소프트웨어 수단으로서 이미지감성카테고리 별로 최적화된 형태로 미리 제작될 수 있다. 예를 들어, '여행지 리뷰', '자동차 리뷰', '음식 리뷰' 등 이미지감성카테고리 별로 그에 맞는 배경음악, 시각적/청각적 효과, 자막 등을 삽입하여 자동 편집하거나, 삽입된 배경음악, 시각적/청각적 효과, 자막 등을 자동으로 삽입하여 영상 자동 편집을 수행하는 영상 편집 툴이 배경음악과 이미지감성카테고리에 맞게 각각 제작될 수 있다. The virtual template according to the present embodiment is a software means for automatic image editing, and may be pre-produced in a form optimized for each image emotion category. For example, by inserting appropriate background music, visual/auditory effects, subtitles, etc. for each image emotion category such as 'travel review', 'car review', 'food review', etc. A video editing tool that performs automatic video editing by automatically inserting auditory effects, subtitles, etc. may be produced according to the background music and image emotion categories.

이러한 가상 템플릿의 제작 방법에 대한 일례로 도 8에 도시된 바와 같이, 이미지감성카테고리에 맞는 음원이 선택되고, 해당 음원에서 비트를 추출하고, 추출된 비트로 이루어진 배경음악에 따라 영상에 대한 편집점이 자동 설정되고, 편집점에 따라 배경음악이 각 장면 별로 적용되며, 이에 대한 렌더링 테스트를 수행할 수 있는 가상 템플릿을 제작하여 배포할 수 있다.As an example of a method for producing such a virtual template, as shown in FIG. 8, a sound source suitable for an image sensibility category is selected, bits are extracted from the sound source, and the editing point for the video is automatically performed according to the background music composed of the extracted bits. It is set, and background music is applied to each scene according to the edit point, and a virtual template capable of performing a rendering test for this can be produced and distributed.

상기 가상 템플릿 자동 선정부(500)는 도 6에 도시된 바와 같이, 음원 비트 추출부(510), 분위기감성분석데이터 생성부(520), 배경음악 선정부(530) 및 가상 템플릿 선정부(540) 중 적어도 하나를 포함할 수 있다.As shown in FIG. 6, the virtual template automatic selection unit 500 includes a sound source bit extraction unit 510, an atmosphere emotion analysis data generation unit 520, a background music selection unit 530, and a virtual template selection unit 540. ) may include at least one of

상기 음원 비트 추출부(510)는, 이미지감성카테고리에 따른 음원을 선택하고, 선택된 음원의 비트를 추출할 수 있다. 즉, 이미지감성카테고리는 해당 영상의 전반적인 이미지감성에 맞게 분류된 것이므로, 이미지감성과 매칭되는 적어도 하나의 음원이 배경음악의 자원으로서 자동 선택될 수 있으며, 선택된 음원에 대한 비트를 각각 추출하여 클립영상 및 사진파일에 대한 배경음악으로 사용될 수 있도록 한다.The sound source bit extractor 510 may select a sound source according to an image sensibility category and extract bits of the selected sound source. That is, since the image sensibility category is classified according to the overall image sensibility of the video, at least one sound source matching the image sensibility can be automatically selected as a resource for background music, and bits for the selected sound source are extracted respectively to clip images. And it can be used as background music for photo files.

상기 분위기감성분석데이터 생성부(520)는, 음원 비트 추출부(510)를 통해 추출된 음원의 비트에 대한 분위기감성을 분석하여 분위기감성분석데이터를 생성할 수 있다. 상술한 음원 선택은 영상의 전반적인 이미지감성에 따라 분류된 결과라면, 분위기감성은 음원의 비트에 따라 세분화된 감성데이터를 의미하며, 미리 정의된 머신러닝 알고리즘을 활용하여 입력 값인 비트데이터에 따라 유머, 분노, 행복, 지루함 등에 대한 분위기감성분석데이터를 결과 값으로 출력할 수 있다. 이와 같이 생성된 분위기감성분석데이터는 음원의 비트에 자동으로 라벨링 될 수 있다.The mood emotion analysis data generating unit 520 may generate mood emotion analysis data by analyzing the mood emotion of the bits of the sound source extracted through the sound source bit extractor 510 . If the above-mentioned sound source selection is the result of classification according to the overall image emotion of the video, mood emotion means emotional data subdivided according to the beat of the sound source, and humor, humor, Atmospheric emotion analysis data for anger, happiness, boredom, etc. can be output as result values. The atmosphere emotion analysis data generated in this way can be automatically labeled on the beat of the sound source.

상기 배경음악 선정부(530)는, 분위기감성분석데이터 생성부(520)를 통해 생성된 분위기감성분석데이터와 이미지감성 분석부(420)를 통해 생성된 이미지감성카테고리 간을 매칭하고, 그 매칭결과를 기반으로 음원의 비트를 영상그룹의 배경음악으로 선정하여 가상 템플릿을 통해 해당 영상그룹에 적용할 수 있도록 한다.The background music selector 530 matches the mood emotion analysis data generated through the mood emotion analysis data generator 520 and the image emotion category generated through the image emotion analysis unit 420, and the matching results. Based on this, the bit of the sound source is selected as the background music of the video group so that it can be applied to the corresponding video group through a virtual template.

예를 들어, 음원에 라벨링 된 분위기감성분석데이터와 영상에 라벨링 된 이미지감성카테고리 중 서로 매칭되는 데이터를 찾고, 그 매칭 결과 분위기감성분석데이터 A와 이미지감성카테고리 B가 서로 매칭된 경우, 분위기감성분석데이터 A가 라벨링되어 있는 음원 비트 a를 이미지감성카테고리 B가 라벨링되어 있는 클립영상 또는 사진파일의 장면에 배경음악으로서 선정할 수 있다.For example, if data that matches each other is found among the mood emotion analysis data labeled in the sound source and the image emotion category labeled in the video, and as a result of the matching, if the mood emotion analysis data A and the image emotion category B are matched with each other, the mood emotion analysis Sound source bit a labeled with data A can be selected as background music for a scene of a clip image or photo file labeled with image sensibility category B.

상기 가상 템플릿 선정부(540)는, 미리 제작된 다수의 영상 편집용 가상 템플릿 중 앞서 선정된 배경음악과 이미지감성카테고리에 맞는 하나의 가상 템플릿을 자동으로 선정하여 영상그룹들에 대한 영상 편집이 선정된 해당 가상 템플릿에 의해 자동으로 수행될 수 있도록 한다.The virtual template selector 540 automatically selects one virtual template suitable for the previously selected background music and image emotion categories among a plurality of pre-made virtual templates for video editing, and the video editing for the video groups is selected. This can be done automatically by the corresponding virtual template.

상기 영상 자동 편집 수행부(600)는, 가상 템플릿 자동 선정부(500)를 통해 선정된 가상 템플릿의 영상 자동 편집 방식에 따라 영상그룹에 대한 자동 편집을 수행할 수 있다.The automatic video editing unit 600 may perform automatic editing on a video group according to the automatic video editing method of the virtual template selected through the automatic virtual template selection unit 500 .

좀 더 구체적으로, 도 7의 (d)에 도시된 바와 같이 가상 템플릿을 통해 배경음악의 재생시간에 따른 영상그룹의 편집점 E1, E2를 자동 설정하고, 편집점에 따라 영상그룹을 자동 편집하고, 편집된 영상그룹에 배경음악을 적용할 수 있다. 영상그룹(또는 클립영상, 사진파일)에 대하여 배경음악이 적용될 수 있으며, 적용된 배경음악은 대략 90초 내지 150초의 재생시간으로 이루어져 있는데, 이러한 배경음악의 재생시간에 맞게 영상그룹 또는 클립영상(또는 사진파일)이 편집이 되어야 하는 지점 또는 시점을 자동 설정할 수 있다. More specifically, as shown in (d) of FIG. 7, the editing points E1 and E2 of the video group are automatically set according to the playback time of the background music through the virtual template, and the video group is automatically edited according to the editing point. , background music can be applied to the edited video group. Background music can be applied to a video group (or clip image, photo file), and the applied background music has a playback time of about 90 to 150 seconds. You can automatically set the point or point in time at which a photo file) should be edited.

또한, 하나의 영상그룹의 재생시간과 그에 적용되는 배경음악의 재생시간이 상이한 경우 그 배경음악에 맞게 영상그룹의 편집 시점 또는 지점을 적절히 자동 설정하여 영상그룹의 재생구간에 맞게 배경음악이 적용되도록 한다. 이렇게 자동 설정된 편집점 E1, E2는 사용자가 가상 템플릿을 이용하여 수동으로 조절 가능하도록 함으로써, 특정 클립영상 또는 사진파일의 배경음악이 수동 편집되도록 할 수 있다.In addition, if the playback time of one video group and the playback time of the background music applied to it are different, the editing time or point of the video group is automatically set appropriately according to the background music so that the background music is applied according to the playback section of the video group. do. The automatically set editing points E1 and E2 can be manually adjusted by the user using a virtual template, so that the background music of a specific clip image or photo file can be manually edited.

또한, 설정된 편집점 E1, E2에 사용자가 등록한 텍스트(타이틀, 엔딩, 자막 등) 삽입, 이미지감성카테고리에 따른 이미지 삽입(효과 이미지), 및 삽입된 텍스트 위치와 형태 설정을 각각 수행하여 영상그룹을 자동 편집할 수 있다.In addition, the video group is created by inserting user-registered text (title, ending, subtitle, etc.) at set edit points E1 and E2, inserting images according to image sensibility categories (effect image), and setting the position and shape of the inserted text, respectively. Can be edited automatically.

이러한 영상 자동 편집 방식은 선정된 가상 템플릿에 따라 조금씩 차이점이 있으며, 이는 배경음악과 이미지감성카테고리를 결정한 이미지감성의 분류결과에 따라 차이가 있다. 즉, 사용자가 입력한 영상들에 대한 이미지감성의 분류결과에 따라 가상 템플릿의 자동 선정되어 그에 따른 적절한 배경음악의 선정과 영상 편집 방식으로 영상의 편집이 자동 수행될 수 있다.This automatic video editing method differs slightly depending on the selected virtual template, which differs according to the classification result of the image emotion that determines the background music and image emotion categories. That is, a virtual template is automatically selected according to the classification result of the image sensibility of the images input by the user, and appropriate background music is selected according to the classification result, and the image can be edited automatically by the image editing method.

상기 동영상파일 생성부(700)는, 트랜지션 영상이 삽입된 영상그룹과 경계점 A1, A2를 서로 결합하여 하나의 동영상파일을 생성할 수 있다. 이때, 사용자가 업로드 한 타이틀, 엔딩 등에 대한 텍스트 정보가 있는 경우 해당 동영상파일의 시작과 종료지점에 해당 텍스트 정보가 삽입된 영상클립이 추가 삽입되어 동영상파일에 적용될 수도 있다. The video file generation unit 700 may generate a single video file by combining the video group into which the transition video is inserted and the boundary points A1 and A2. At this time, if there is text information about a title, an ending, etc. uploaded by a user, a video clip in which the text information is inserted may be additionally inserted at the start and end points of the corresponding video file and applied to the video file.

상기 동영상파일 배포부(800)는, 동영상파일을 렌더링 및 압축한 후 유선 또는 무선 인터넷 통신망을 통해 사용자통신단말(10)로 전송하여 배포 또는 전송함으로써, 최초 업로드 한 다수의 동영상 및 사진을 이용하여 특정한 스토리 또는 시퀀스를 가지며 매끄럽고 효과적인 장면전환이 연출되도록 편지된 하나의 동영상파일을 제공할 수 있다.The video file distributing unit 800 renders and compresses the video file and then distributes or transmits the video file to the user communication terminal 10 through a wired or wireless Internet communication network, thereby using a plurality of videos and photos uploaded for the first time. It is possible to provide a single video file that has a specific story or sequence and is written so that smooth and effective scene transitions can be produced.

도 9는 본 발명의 다른 실시예에 따른 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 방법의 구성을 나타낸 순서도이고, 도 10은 본 발명의 다른 실시예에 따른 영상파일 등록 단계의 구성을 나타낸 순서도이고, 도 11은 본 발명의 다른 실시예에 따른 영상그룹 생성 단계의 구성을 나타낸 순서도이고, 도 12는 본 발명의 다른 실시예에 따른 이미지감성 분류 단계의 구성을 나타낸 순서도이며, 도 13은 본 발명의 다른 실시예에 따른 가상 템플릿 자동 선정 단계의 구성을 나타낸 순서도이다.9 is a flowchart showing the configuration of a method for automatically editing a video through the use of auto labeling and automatic selection of a virtual template according to another embodiment of the present invention, and FIG. 10 shows the configuration of a video file registration step according to another embodiment of the present invention. FIG. 11 is a flow chart showing the configuration of an image group generation step according to another embodiment of the present invention, FIG. 12 is a flowchart showing the configuration of an image emotion classification step according to another embodiment of the present invention, FIG. 13 is a flowchart showing the configuration of the step of automatically selecting a virtual template according to another embodiment of the present invention.

도 9를 참조하면, 본 발명의 일 실시예에 따른 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 방법(S1000)은 영상파일 등록 단계(S100), 이미지 전처리 단계(S200), 영상그룹 생성 단계(S300), 이미지감성 분류 단계(S400), 가상 템플릿 자동 선정 단계(S500), 영상 자동 편집 수행 단계(S600), 동영상파일 생성 단계(S700) 및 동영상파일 배포 단계(S800) 중 적어도 하나를 포함할 수 있다.Referring to FIG. 9 , a method for automatically editing a video (S1000) by using auto labeling and automatically selecting a virtual template according to an embodiment of the present invention includes a video file registration step (S100), an image preprocessing step (S200), and image group creation. At least one of step (S300), image sensibility classification step (S400), automatic virtual template selection step (S500), automatic image editing step (S600), video file generation step (S700), and video file distribution step (S800). can include

상기 영상파일 등록 단계(S100)는, 사용자통신단말(10)로부터 다수의 영상파일을 입력 받아 자동편집서버(20)로 업로드 할 수 있다.In the video file registration step (S100), a plurality of video files may be received from the user communication terminal 10 and uploaded to the automatic editing server 20.

이를 위해 영상파일 등록 단계(S100)는 도 10에 도시된 바와 같이, 영상파일 선택 단계(S110)와 영상파일 업로드 단계(S120) 중 적어도 하나를 포함할 수 있다.To this end, the image file registration step (S100) may include at least one of an image file selection step (S110) and an image file upload step (S120), as shown in FIG.

상기 영상파일 선택 단계(S110)는, 사용자통신단말(10)의 앨범 또는 사진첩에 접근하여, 해당 앨범 또는 사진첩에 저장된 동영상파일 및 사진파일 중 적어도 하나의 영상파일을 선택 받도록 하거나, 사용자통신단말(10)의 카메라를 통해 촬영된 동영상이나 사진 중 어느 하나를 선택할 수 있도록 한다. In the video file selection step (S110), by accessing the album or photo album of the user communication terminal 10, selecting at least one video file from among video files and photo files stored in the corresponding album or photo album, or by the user communication terminal ( 10) allows users to select either a video or a photo taken by the camera.

상기 영상파일 업로드 단계(S120)는, 영상파일 선택 단계(S110)를 통해 선택된 동영상, 사진 등의 영상파일을 유선 또는 무선 인터넷 통신망을 통해 자동편집서버(20)로 업로드 할 수 있다. In the video file uploading step (S120), video files such as videos and photos selected in the video file selection step (S110) can be uploaded to the automatic editing server 20 through a wired or wireless Internet communication network.

상기 이미지 전처리 단계(S200)는, 영상파일 등록 단계(S100)를 통해 업로드 된 영상파일(동영상, 사진)의 사이즈를 미리 설정된 사이즈로 통일되도록 각각 변환하고, 영상파일의 방향이 미리 설정된 방향으로 정렬되도록 회전시켜 영상파일에 포함된 이미지데이터를 정규화할 수 있으며, 전처리가 완료된 영상파일을 영상그룹 생성 단계(S300)로 전달할 수 있다. 이때, 영상파일은 그 촬영방향 즉 세로로 촬영되었는지 또는 가로로 촬영되었는지에 따라 해당 영상의 사이즈와 형태가 상이할 수 있으므로, 이를 통일시키도록 영상을 회전 변환시킬 수 있다.In the image preprocessing step (S200), the size of the image files (videos and photos) uploaded through the image file registration step (S100) is converted to a uniform size, respectively, and the directions of the image files are aligned in the preset direction. Image data included in the image file may be normalized by rotating as much as possible, and the image file for which preprocessing is completed may be transmitted to the image group generation step (S300). At this time, since the size and shape of the corresponding image may be different depending on the image file's shooting direction, that is, vertical or horizontal, the images may be rotated and converted to unify them.

상기 영상그룹 생성 단계(S300)는, 다수의 영상파일(동영상, 사진)인식된 객체데이터에 따라 다수의 클립영상으로 분할하고, 인식된 객체데이터를 기반으로 다수의 클립영상 간 맥락을 추론하고, 맥락추론결과에 따라 다수의 클립영상을 배열 및 그룹화하여 다수의 클립영상을 포함한 적어도 하나의 영상그룹을 생성할 수 있다.In the image group generating step (S300), a plurality of image files (videos, photos) are divided into a plurality of clip images according to the recognized object data, and the context between the plurality of clip images is inferred based on the recognized object data, At least one image group including a plurality of clip images may be created by arranging and grouping a plurality of clip images according to the result of context inference.

이를 위해 영상그룹 생성 단계(S300)는 도 11에 도시된 바와 같이, 객체 인식 단계(S310), 영상파일 분할 단계(S320), 객체 분석 단계(S330), 영상배열 형성 단계(S340) 및 영상그룹 생성 단계(S350) 중 적어도 하나를 포함할 수 있다.To this end, the image group creation step (S300), as shown in FIG. 11, includes object recognition step (S310), image file division step (S320), object analysis step (S330), image array formation step (S340), and image group At least one of the generating steps (S350) may be included.

상기 객체 인식 단계(S310)는, 전처리가 완료된 영상파일(동영상, 사진)에서 인물객체 및 사물객체를 각각 인식할 수 있다. 이러한 객체 인식 단계(S310)는 객체 인식을 위해 미리 정의된 머신러닝 알고리즘을 활용하여 영상파일의 영상데이터 내 존재하는 특정 객체를 인식할 수 있다. 여기서, 객체는 사람(인물), 동물(개, 고양이 등), 사물(자동차, 건축물, 교량, 신호등 등) 등 다양한 대상을 포함할 수 있으며, 본 실시예에서는 미리 설정된 객체 또는 객체군에 대한 인식 프로세스를 제공할 수 있다. In the object recognizing step (S310), a person object and an object object may be recognized in the pre-processed image file (video, photo), respectively. In this object recognition step (S310), a specific object present in the image data of the image file may be recognized by utilizing a predefined machine learning algorithm for object recognition. Here, the object may include various objects such as people (persons), animals (dogs, cats, etc.), objects (cars, buildings, bridges, traffic lights, etc.), and in this embodiment, recognition of a preset object or group of objects. process can be provided.

상기 영상파일 분할 단계(S320)는, 업로드 된 영상파일이 동영상파일인 경우 해당 동영상파일에서 인식된 객체데이터(인물객체, 사물객체)에 따라 해당 동영상파일을 해당 객체를 기준으로 분할하여 다수의 클립영상을 생성할 수 있다.In the video file division step (S320), if the uploaded video file is a video file, the video file is divided based on the object according to the object data (person object, object object) recognized in the video file to provide a plurality of clips. You can create video.

상기 객체 분석 단계(S330)는, 클립영상 및 사진파일 별로 인물객체의 성별, 연령, 행동 및 감정 중 적어도 하나와 사물객체에 대한 각각의 특징을 분석하여 객체분석데이터를 생성할 수 있다. 여기서, 객체분석데이터는 후술하는 맥락추론을 위한 기초정보로서 활용되며, 클립영상 및 사진들에 대한 전후 맥락을 추론하기에 앞서 각 파일들에 나타나는 인물객체의 특징적 요소 즉, 성별, 연령, 행동, 감정 등의 특징 사물객체에 대한 특징적 요소를 각각 미리 정의된 머신러닝 알고리즘을 활용하여 분석할 수 있으며, 각각의 분석결과들은 후술하는 맥락추론을 위해 기초정보 중 하나로서 활용될 수 있다.In the object analysis step (S330), object analysis data may be generated by analyzing at least one of the sex, age, behavior, and emotion of the person object and each characteristic of the object object for each clip image and photo file. Here, the object analysis data is used as basic information for context inference, which will be described later, and the characteristic elements of the person object appearing in each file, that is, gender, age, behavior, Each characteristic element of a characteristic object, such as emotion, can be analyzed using a predefined machine learning algorithm, and each analysis result can be used as one of the basic information for context inference described later.

상기 영상배열 형성 단계(S340)는, 영상파일(클립영상 및 사진파일)에 대한 객체분석데이터를 기반으로 클립영상 및 사진파일 각각 간의 맥락을 추론하고, 맥락추론결과에 따라 클립영상 및 사진파일을 자동 배열 또는 정렬하여 영상배열을 형성할 수 있다.In the image array forming step (S340), the context between each of the clip image and photo files is inferred based on the object analysis data for the image file (clip image and photo file), and the clip image and photo file are generated according to the context inference result. Image alignment can be formed by automatic alignment or alignment.

한편, 영상배열 형성 단계(S340)는, 영상배열에 포함된 클립영상을 재생 가능하게 표시하고, 영상배열에 포함된 클립영상 및 사진파일 각각의 배열 순서를 드래그 앤 드랍 방식으로 변경하기 위한 제1 사용자 인터페이스를 제공할 수 있다.On the other hand, in the image array forming step (S340), a first step for displaying the clip images included in the image array in a reproducible manner and changing the arrangement order of each of the clip images and photo files included in the image array by a drag-and-drop method. A user interface can be provided.

상기 영상그룹 생성 단계(S350)는, 영상배열 형성 단계(S340)를 통한 맥락추론결과를 기반으로 영상배열에서 맥락종료지점을 각각 경계점으로 자동 설정하고, 설정된 경계점을 기준으로 영상배열에 대한 그룹화를 수행하여 영상그룹을 생성할 수 있다.In the image group generating step (S350), based on the result of context inference through the image array forming step (S340), each context end point in the image array is automatically set as a boundary point, and the image array is grouped based on the set boundary point. You can create an image group by performing

한편, 영상그룹 생성 단계(S350)는, 맥락종료지점 즉 경계점의 위치를 드래그 앤 드랍 방식으로 변경하기 위한 제2 사용자 인터페이스를 제공할 수 있다.Meanwhile, in the image group creation step (S350), a second user interface may be provided to change the position of the context end point, that is, the border point, by a drag and drop method.

상기 이미지감성 분류 단계(S400)는, 영상그룹 생성 단계(S300)를 통해 생성된 클립영상 별로 객체데이터에 대한 이미지분석을 각각 수행하고, 각 이미지분석결과에 따른 이미지감성을 각각 분류할 수 있다. In the image emotion classification step (S400), image analysis is performed on the object data for each clip image generated through the image group generation step (S300), and image emotions according to the results of each image analysis may be classified.

이를 위해 이미지감성 분류 단계(S400)는 도 12에 도시된 바와 같이 인물객체와 사물객체를 기반으로 미리 정의된 머신러닝 알고리즘을 활용하여 클립영상 및 사진파일에 대한 이미지감성을 분석하고, 그 분석결과에 따라 사용자가 등록한 영상파일 전체에 대한 이미지감성카테고리를 분류할 수 있다.To this end, in the image emotion classification step (S400), as shown in FIG. 12, the image emotion of the clip image and photo file is analyzed using a predefined machine learning algorithm based on the person object and the object object, and the analysis result According to this, the image sensibility category for all image files registered by the user can be classified.

이러한 이미지감성 분류 단계(S400)는 인물객체와 사물객체에 대한 각각의 이미지감성분석결과를 토대로 평균적인 이미지감성분석결과를 도출하고, 그에 따른 이미지감성카테고리로 최종 분류할 수 있다.In this image emotion classification step (S400), an average image emotion analysis result may be derived based on the respective image emotion analysis results for the person object and the object object, and the result may be finally classified into an image emotion category.

상기 가상 템플릿 자동 선정 단계(S500)는, 이미지감성 분류 단계(S400) 를 통해 분류된 이미지감성카테고리에 따른 배경음악을 자동 선정하고, 선정된 배경음악과 이미지감성카테고리에 따라 미리 제작된 다수의 영상 편집용 가상 템플릿 중 하나의 가상 템플릿을 자동으로 선정할 수 있다. The virtual template automatic selection step (S500) automatically selects background music according to the image emotion category classified through the image emotion classification step (S400), and a plurality of pre-produced images according to the selected background music and image emotion category One of the virtual templates for editing may be automatically selected.

본 실시예에 따른 가상 템플릿은 영상 자동 편집을 위한 소프트웨어 수단으로서 이미지감성카테고리 별로 최적화된 형태로 미리 제작될 수 있다. 예를 들어, '여행지 리뷰', '자동차 리뷰', '음식 리뷰' 등 이미지감성카테고리 별로 그에 맞는 배경음악, 시각적/청각적 효과, 자막 등을 삽입하여 자동 편집하거나, 삽입된 배경음악, 시각적/청각적 효과, 자막 등을 자동으로 삽입하여 영상 자동 편집을 수행하는 영상 편집 툴이 배경음악과 이미지감성카테고리에 맞게 각각 제작될 수 있다. 이러한 가상 템플릿의 제작 방법에 대한 일례로 도 8에 도시된 바와 같이, 이미지감성카테고리에 맞는 음원이 선택되고, 해당 음원에서 비트를 추출하고, 추출된 비트로 이루어진 배경음악에 따라 영상에 대한 편집점이 자동 설정되고, 편집점에 따라 배경음악이 각 장면 별로 적용되며, 이에 대한 렌더링 테스트를 수행할 수 있는 가상 템플릿을 제작하여 배포할 수 있다.The virtual template according to the present embodiment is a software means for automatic image editing, and may be pre-produced in a form optimized for each image emotion category. For example, by inserting appropriate background music, visual/auditory effects, subtitles, etc. for each image emotion category such as 'travel review', 'car review', 'food review', etc. A video editing tool that performs automatic video editing by automatically inserting auditory effects, subtitles, etc. may be produced according to the background music and image emotion categories. As an example of a method for producing such a virtual template, as shown in FIG. 8, a sound source suitable for an image sensibility category is selected, bits are extracted from the sound source, and the editing point for the video is automatically performed according to the background music composed of the extracted bits. It is set, and background music is applied to each scene according to the edit point, and a virtual template capable of performing a rendering test for this can be produced and distributed.

상기 가상 템플릿 자동 선정 단계(S500)는 도 13에 도시된 바와 같이, 음원 비트 추출 단계(S510), 분위기감성분석데이터 생성 단계(S520), 배경음악 선정 단계(S530) 및 가상 템플릿 선정 단계(S540) 중 적어도 하나를 포함할 수 있다.As shown in FIG. 13, the virtual template automatic selection step (S500) includes sound source bit extraction step (S510), mood and emotion analysis data generation step (S520), background music selection step (S530), and virtual template selection step (S540). ) may include at least one of

상기 음원 비트 추출 단계(S510)에서는, 이미지감성카테고리에 따른 음원을 선택하고, 선택된 음원의 비트를 추출할 수 있다. 즉, 이미지감성카테고리는 해당 영상의 전반적인 이미지감성에 맞게 분류된 것이므로, 이미지감성과 매칭되는 적어도 하나의 음원이 배경음악의 자원으로서 자동 선택될 수 있으며, 선택된 음원에 대한 비트를 각각 추출하여 클립영상 및 사진파일에 대한 배경음악으로 사용될 수 있도록 한다.In the sound source bit extraction step (S510), a sound source according to an image sensibility category may be selected, and bits of the selected sound source may be extracted. That is, since the image sensibility category is classified according to the overall image sensibility of the video, at least one sound source matching the image sensibility can be automatically selected as a resource for background music, and bits for the selected sound source are extracted respectively to clip images. And it can be used as background music for photo files.

상기 분위기감성분석데이터 생성 단계(S520)는, 음원 비트 추출 단계(S510)를 통해 추출된 음원의 비트에 대한 분위기감성을 분석하여 분위기감성분석데이터를 생성할 수 있다. 상술한 음원 선택은 영상의 전반적인 이미지감성에 따라 분류된 결과라면, 분위기감성은 음원의 비트에 따라 세분화된 감성데이터를 의미하며, 미리 정의된 머신러닝 알고리즘을 활용하여 입력 값인 비트데이터에 따라 유머, 분노, 행복, 지루함 등에 대한 분위기감성분석데이터를 결과 값으로 출력할 수 있다. 이와 같이 생성된 분위기감성분석데이터는 음원의 비트에 자동으로 라벨링 될 수 있다.In the mood emotion analysis data generating step (S520), the mood emotion analysis data may be generated by analyzing the mood emotion of the bits of the sound source extracted through the sound source bit extraction step (S510). If the above-mentioned sound source selection is the result of classification according to the overall image emotion of the video, mood emotion means emotional data subdivided according to the beat of the sound source, and humor, humor, Atmospheric emotion analysis data for anger, happiness, boredom, etc. can be output as result values. The atmosphere emotion analysis data generated in this way can be automatically labeled on the beat of the sound source.

상기 배경음악 선정 단계(S530)는, 분위기감성분석데이터 생성 단계(S520)를 통해 생성된 분위기감성분석데이터와 이미지감성 분석 단계(S420)를 통해 생성된 이미지감성카테고리 간을 매칭하고, 그 매칭결과를 기반으로 음원의 비트를 영상그룹의 배경음악으로 선정하여 가상 템플릿을 통해 해당 영상그룹에 적용할 수 있도록 한다.In the background music selection step (S530), the mood emotion analysis data generated through the mood emotion analysis data generation step (S520) and the image emotion category generated through the image emotion analysis step (S420) are matched, and the matching result Based on this, the bit of the sound source is selected as the background music of the video group so that it can be applied to the corresponding video group through a virtual template.

상기 가상 템플릿 선정 단계(S540)는, 미리 제작된 다수의 영상 편집용 가상 템플릿 중 앞서 선정된 배경음악과 이미지감성카테고리에 맞는 하나의 가상 템플릿을 자동으로 선정하여 영상그룹들에 대한 영상 편집이 선정된 해당 가상 템플릿에 의해 자동으로 수행될 수 있도록 한다.In the virtual template selection step (S540), one virtual template suitable for the previously selected background music and image sensibility category is automatically selected from among a plurality of pre-made virtual templates for video editing, and the video editing for the video groups is selected. This can be done automatically by the corresponding virtual template.

상기 영상 자동 편집 수행 단계(S600)는, 가상 템플릿 자동 선정 단계(S500)를 통해 선정된 가상 템플릿의 영상 자동 편집 방식에 따라 영상그룹에 대한 자동 편집을 수행할 수 있다.In the automatic image editing step (S600), the image group may be automatically edited according to the image automatic editing method of the virtual template selected through the virtual template automatic selection step (S500).

이를 위해 영상 자동 편집 수행 단계(S600)는 도 13에 도시된 바와 같이, 음원 비트 추출 단계(S610), 분위기감성분석데이터 생성 단계(S620), 배경음악 선정 단계(S630) 및 영상 자동 편집 단계(S640) 중 적어도 하나를 포함할 수 있다.To this end, the automatic video editing step (S600) is, as shown in FIG. S640) may include at least one.

상기 동영상파일 생성 단계(S700)는, 트랜지션 영상이 삽입된 영상그룹과 경계점 A1, A2를 서로 결합하여 하나의 동영상파일을 생성할 수 있다. 이때, 사용자가 업로드 한 타이틀, 엔딩 등에 대한 텍스트 정보가 있는 경우 해당 동영상파일의 시작과 종료지점에 해당 텍스트 정보가 삽입된 영상클립이 추가 삽입되어 동영상파일에 적용될 수도 있다. In the video file generating step (S700), a video file may be created by combining the video group into which the transition video is inserted and the boundary points A1 and A2. At this time, if there is text information about a title, an ending, etc. uploaded by a user, a video clip in which the text information is inserted may be additionally inserted at the start and end points of the corresponding video file and applied to the video file.

상기 동영상파일 배포 단계(S800)는, 동영상파일을 렌더링 및 압축한 후 유선 또는 무선 인터넷 통신망을 통해 사용자통신단말(10)로 전송하여 배포 또는 전송함으로써, 최초 업로드 한 다수의 동영상 및 사진을 이용하여 특정한 스토리 또는 시퀀스를 가지며 매끄럽고 효과적인 장면전환이 연출되도록 편지된 하나의 동영상파일을 제공할 수 있다.In the video file distribution step (S800), the video file is rendered and compressed, and then transmitted to the user communication terminal 10 through a wired or wireless Internet communication network for distribution or transmission, using a plurality of videos and photos uploaded for the first time. It is possible to provide a single video file that has a specific story or sequence and is written so that smooth and effective scene transitions can be produced.

이상에서 설명한 것은 본 발명에 의한 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 시스템 및 그 방법을 실시하기 위한 하나의 실시예에 불과한 것으로서, 본 발명은 상기 실시예에 한정되지 않고, 이하의 특허청구범위에서 청구하는 바와 같이 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변경 실시가 가능한 범위까지 본 발명의 기술적 정신이 있다고 할 것이다.What has been described above is only one embodiment for implementing the automatic video editing system and method through the use of auto labeling and automatic selection of virtual templates according to the present invention, the present invention is not limited to the above embodiment, and the following As claimed in the claims, anyone with ordinary knowledge in the field to which the present invention pertains without departing from the gist of the present invention will say that the technical spirit of the present invention exists to the extent that various changes can be made.

1000: 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 시스템
100: 영상파일 등록부
110: 영상파일 선택부
120: 영상파일 업로드부
200: 이미지 전처리부
300: 영상그룹 생성부
310: 객체 인식부
320: 영상파일 분할부
330: 객체 분석부
340: 영상배열 형성부
350: 영상그룹 생성부
400: 이미지감성 분류부
500: 가상 템플릿 자동 선정부
510: 음원 비트 추출부
520: 분위기감성분석데이터 생성부
530: 배경음악 선정부
540: 영상 자동 편집부
550: 가상 템플릿 선정부
600: 영상 자동 편집 수행부
700: 동영상파일 생성부
800: 동영상파일 배포부
S1000: 오토 라벨링 활용과 가상 템플릿 자동 선정을 통한 영상 자동 편집 방법
S100: 영상파일 등록 단계
S110: 영상파일 선택 단계
S120: 영상파일 업로드 단계
S200: 이미지 전처리 단계
S300: 영상그룹 생성 단계
S310: 객체 인식 단계
S320: 영상파일 분할 단계
S330: 객체 분석 단계
S340: 영상배열 형성 단계
S350: 영상그룹 생성 단계
S400: 이미지감성 분류 단계
S500: 가상 템플릿 자동 선정 단계
S510: 음원 비트 추출 단계
S520: 분위기감성분석데이터 생성 단계
S530: 배경음악 선정 단계
S540: 영상 자동 편집 단계
S550: 가상 템플릿 선정 단계
S600: 영상 자동 편집 수행 단계
S700: 동영상파일 생성 단계
S800: 동영상파일 배포 단계1000: Automatic video editing system through the use of auto labeling and automatic selection of virtual templates
100: image file register
110: video file selection unit
120: image file upload unit
200: image pre-processing unit
300: image group creation unit
310: object recognition unit
320: video file division unit
330: object analysis unit
340: image array forming unit
350: image group creation unit
400: image sensibility classification unit
500: virtual template automatic selection unit
510: sound source bit extraction unit
520: atmosphere emotion analysis data generation unit
530: background music selection unit
540: video automatic editing unit
550: virtual template selection unit
600: video automatic editing unit
700: video file generation unit
800: video file distribution unit
S1000: Method of automatic video editing through the use of auto labeling and automatic selection of virtual templates
S100: image file registration step
S110: Video file selection step
S120: Image file upload step
S200: image pre-processing step
S300: Image group creation step
S310: object recognition step
S320: Image file division step
S330: object analysis step
S340: image array formation step
S350: image group creation step
S400: Image sensibility classification step
S500: Automatic virtual template selection step
S510: sound source bit extraction step
S520: Atmosphere Sentiment Analysis Data Generation Step
S530: background music selection step
S540: Video automatic editing step
S550: Virtual template selection step
S600: Image automatic editing step
S700: video file creation step
S800: Video file distribution step

Claims

an image group creation step of generating at least one image group by dividing a plurality of image files into a plurality of clip images according to object data recognized by the image group generator, and grouping the plurality of clip images based on the object data;
an image emotion classification step in which an image emotion classification unit performs image emotion analysis on the clip image based on the object data and classifies image emotion categories for the entire image file according to the image analysis result;
The virtual template automatic selection unit selects background music according to the classification result of the image sensibility, and automatically selects one virtual template among a plurality of pre-made virtual templates for video editing according to the selected background music and the image sensibility category. a virtual template automatic selection step; and
An automatic image editing step of performing, by an automatic image editing unit, automatic editing of the image group according to the image editing method of the virtual template selected through the automatic selection of the virtual template;
The image file includes at least one of a video file and a photo file,
The image group creation step,
An object recognition step of recognizing a person object and an object object in the image file, respectively;
If the video file is a video file, dividing the corresponding video file according to the person object and the object object to generate a plurality of clip images;
an object analysis step of generating object analysis data by analyzing at least one of the sex, age, behavior, and emotion of the person object and the object object for each clip image and photo file;
an image array forming step of inferring a context between each clip image and photo file based on the object analysis data and automatically arranging the clip image and photo file according to the result of the context inference to form an image array; and
An image group forming step of automatically setting a context end point in the image array as a boundary point based on a result of the context inference, and grouping the image array based on the set boundary point to form the image group. An automatic image editing method through the use of auto labeling and automatic selection of virtual templates, characterized in that.

According to claim 1,
A video file registration step in which the video file registration unit receives a plurality of video files from the user communication terminal and uploads them to an automatic editing server;
a video file generation step of generating a video file by combining, by a video file creation unit, the video groups automatically edited through the automatic video editing step; and
An automatic video editing method through the use of auto labeling and automatic selection of a virtual template, characterized in that it further comprises a video file distribution step in which the video file distribution unit transmits and distributes the video file to a user communication terminal.

According to claim 2,
The video file registration step,
A video file selection step of receiving a selection of at least one video file from among a video file and a photo file; and
An image file upload step of uploading the image file selected through the image file selection step.

According to claim 2,
Creating the image group by converting the size of each image file uploaded through the image file registration step to a preset size, and normalizing image data included in the image file by rotating the image file so that the orientation of the image file is aligned in the preset direction. Image pre-processing step forwarded to execute the video automatic editing method through the use of auto labeling and automatic selection of virtual templates, characterized in that it further comprises.

delete

According to claim 1,
The image sensibility classification step,
Based on the person object and the object object, the image sensibility of clip images and photo files is analyzed, and the image sensibility category for the entire image file is classified according to the image analysis result. Automatic video editing method through selection.

According to claim 6,
In the step of automatically selecting the virtual template,
a sound source bit extraction step of selecting a sound source according to the image sensibility category and extracting bits of the selected sound source;
an atmosphere emotion analysis data generating step of generating atmosphere emotion analysis data by analyzing the mood emotion of the beat of the sound source;
a background music selection step of matching the mood emotion analysis data with the image emotion category, selecting a beat of the sound source as background music for the image group based on a matching result, and providing the background music through the virtual template; and
A virtual template selection step of selecting one virtual template from among a plurality of pre-produced virtual templates for image editing according to the background music and the image emotion category Automatic editing method.

According to claim 1,
The step of automatically editing the video,
Through the virtual template, the editing point of the video group is automatically set according to the playing time of the background music, and the text registered by the user is inserted into the editing point, the image is inserted according to the image emotion category, and the position and shape of the inserted text is inserted. An automatic video editing method using auto labeling and automatic selection of a virtual template, characterized in that each setting is performed to automatically edit the video group, and the background music is applied to the edited video group.

an image group creation unit for generating at least one image group by dividing a plurality of image files into a plurality of clip images according to object data recognized from the plurality of image files and grouping the plurality of clip images;
an image emotion classification unit that performs image emotion analysis on the clip image based on the object data and classifies image emotion categories for the entire image file according to the image analysis result;
Background music is selected according to the classification result of the image sensibility category, and one virtual template is automatically selected among a plurality of pre-made virtual templates for video editing according to the selected background music and the image sensibility category. government; and
An automatic video editing performer for performing automatic editing on the video group according to the video editing method of the virtual template selected through the automatic virtual template selection unit;
The image file includes at least one of a video file and a photo file,
The image group creation unit,
an object recognition unit for recognizing a person object and an object object in an image file;
If the video file is a video file, a video file division unit for generating a plurality of clip images by dividing the corresponding video file according to the person object and the object object;
an object analyzer configured to generate object analysis data by analyzing at least one of gender, age, behavior, and emotion of the person object and the object object for each clip image and photo file;
an image array forming unit for inferring the context between each of the clip image and photo files based on the object analysis data and automatically arranging the clip image and photo files according to the result of the context inference to form an image array; and
and an image group forming unit for forming the image groups by automatically setting context end points in the image array as boundary points based on a result of the context inference, and grouping the image arrays based on the set boundary points. An automatic image editing system through the use of auto labeling and automatic selection of virtual templates.

According to claim 9,
a video file registration unit that receives a plurality of video files from a user communication terminal and uploads them to an automatic editing server;
a video file generation unit for generating a video file by combining the video groups automatically edited by the video automatic editing unit; and
An automatic video editing system through auto-labeling utilization and virtual template automatic selection, characterized in that it further comprises a video file distribution unit for transmitting and distributing the video file to a user communication terminal.

According to claim 10,
The video file registration unit,
a video file selector receiving at least one video file selected from a video file and a photo file; and
An automatic video editing system using auto labeling and automatic selection of a virtual template, characterized in that it comprises a video file upload unit for uploading the video file selected through the video file selection unit.

According to claim 10,
The size of each video file uploaded through the video file registration unit is converted to a preset size, and the image data included in the video file is normalized by rotating the image file so that the direction of the video file is aligned in the preset direction, and the image data is transmitted to the video group generator. An automatic video editing system through the use of auto labeling and automatic selection of virtual templates, characterized in that it further comprises an image pre-processing unit to do.

delete

According to claim 9,
The image sensibility classification unit,
Based on the person object and the object object, the image sensibility of clip images and photo files is analyzed, and the image sensibility category for the entire image file is classified according to the image analysis result. Utilization of auto labeling and automatic virtual template Video automatic editing system through selection.

According to claim 14,
The video automatic editing unit,
a sound source bit extractor for selecting a sound source according to the image sensibility category and extracting bits of the selected sound source;
an atmosphere emotion analysis data generation unit configured to generate mood emotion analysis data by analyzing the mood emotion of the beat of the sound source;
a background music selector that matches the mood emotion analysis data with the image emotion category, selects a beat of the sound source as background music for the image group based on the matching result, and provides the selected beat through the virtual template; and
A virtual template selection unit for selecting one of the virtual templates for video editing prepared in advance according to the background music and the image sensibility category; editing system.

According to claim 9,
The video automatic editing unit,
Through the virtual template, the editing point of the video group is automatically set according to the playing time of the background music, and the text registered by the user is inserted into the editing point, the image is inserted according to the image emotion category, and the position and shape of the inserted text is inserted. An automatic video editing system using auto labeling and automatic selection of a virtual template, characterized in that each setting is performed to automatically edit the video group, and the background music is applied to the edited video group.