KR102762246B1

KR102762246B1 - The Method Of Producing Automatic Video Content Based On User Input

Info

Publication number: KR102762246B1
Application number: KR1020240067571A
Authority: KR
Inventors: 권택순; 배영하
Original assignee: (주)이스트소프트
Priority date: 2024-05-24
Filing date: 2024-05-24
Publication date: 2025-02-04
Anticipated expiration: 2044-05-24

Abstract

본 발명은 사용자 입력 기반 자동 영상 컨텐츠 제작방법으로서, 더 구체적으로는, 서비스서버가 사용자단말로부터 수신한 사용자텍스트를 거대언어모델에 입력하여 타이틀, 복수의 서브타이틀, 및 복수의 서브텍스트를 추출하여 복수의 서브타이틀 및 복수의 서브텍스트 각각에 상응하는 1 이상의 이미지를 생성한 뒤, 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지를 포함하는 복수의 슬라이드를 생성하고, 사용자단말을 통해 AI휴먼영상의 레이아웃 및 AI휴먼캐릭터가 설정되면 AI휴먼이 상기 서브텍스트를 발화하고 상기 복수의 슬라이드를포함하는 AI휴먼영상 및 AI휴먼영상 생성을 위한 편집인터페이스를 사용자단말로 제공하는, 사용자 입력 기반 자동 영상 컨텐츠 제작방법에 관한 것이다.
The present invention relates to a method for automatically producing video content based on user input, and more specifically, to a method for automatically producing video content based on user input, in which a service server inputs user text received from a user terminal into a macro-language model to extract a title, a plurality of subtitles, and a plurality of subtexts, and generates one or more images corresponding to each of the plurality of subtitles and the plurality of subtexts, and then generates a plurality of slides including the title, the subtitles, and the generated images, and when the layout and AI human characters of the AI human video are set through the user terminal, the AI human utters the subtexts and provides the user terminal with an AI human video including the plurality of slides and an editing interface for generating the AI human video.

Description

{The Method Of Producing Automatic Video Content Based On User Input}

본 발명은 사용자 입력 기반 자동 영상 컨텐츠 제작방법으로서, 더 구체적으로는, 서비스서버가 사용자단말로부터 수신한 사용자텍스트를 거대언어모델에 입력하여 타이틀, 복수의 서브타이틀, 및 복수의 서브텍스트를 추출하여 복수의 서브타이틀 및 복수의 서브텍스트 각각에 상응하는 1 이상의 이미지를 생성한 뒤, 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지를 포함하는 복수의 슬라이드를 생성하고, 사용자단말을 통해 AI휴먼영상의 레이아웃 및 AI휴먼캐릭터가 설정되면 AI휴먼이 상기 서브텍스트를 발화하고 상기 복수의 슬라이드를 포함하는 AI휴먼영상 및 AI휴먼영상 생성을 위한 편집인터페이스를 사용자단말로 제공하는, 사용자 입력 기반 자동 영상 컨텐츠 제작방법에 관한 것이다.The present invention relates to a method for automatically producing video content based on user input, and more specifically, to a method for automatically producing video content based on user input, in which a service server inputs user text received from a user terminal into a macro-language model to extract a title, a plurality of subtitles, and a plurality of subtexts, and generates one or more images corresponding to each of the plurality of subtitles and the plurality of subtexts, and then generates a plurality of slides including the title, the subtitles, and the generated images, and when the layout and AI human characters of the AI human video are set through the user terminal, the AI human utters the subtexts and provides the user terminal with an AI human video including the plurality of slides and an editing interface for generating the AI human video.

모바일 빅데이터 플랫폼 ‘모바일 인덱스’의 조사에 따르면, 2023년 12월부터 2024년 3월까지 한국인이 가장 많이 사용하는 어플은 유튜브로 집계되었고, 2023년 12월 유튜브의 월 순이용자는 4,565만 명으로 나타났다. 이처럼 최근 SNS에서는 영상 컨텐츠가 많은 인기를 끌고 있으며, 기업의 광고나 선거에 대한 홍보에 있어서도 영상 컨텐츠를 제작하여 SNS에 올리기도 한다. 이처럼, 영상 컨텐츠는 시청자의 집중력과 흥미를 일으켜 효과적으로 메시지를 전달하거나 기억에 남길 수 있다.According to a survey by the mobile big data platform 'Mobile Index', YouTube was the most used app by Koreans from December 2023 to March 2024, and YouTube's monthly unique users in December 2023 were 45.65 million. Recently, video content has been gaining popularity on SNS, and video content is also produced and uploaded to SNS for corporate advertisements or election promotions. In this way, video content can effectively convey messages or leave them in the memory by arousing the concentration and interest of viewers.

한편, 현재 영상을 제작하는 과정에는 시간이 많이 소요되고 전문적인 지식 및 연습이 필요할 수 있다. 따라서, 비전문가는 영상 제작에 대한 큰 어려움을 느낄 수 있다. 이러한 상황에서, 사용자가 간단한 내용만 입력하면 자동으로 전문적인 영상 컨텐츠를 제작해주는서비스가 필요한 상황이다. 따라서, 사용자가 텍스트를 입력하면 AI를 통해 관련 이미지를 자동으로 생성하고, 해당 이미지를 포함하여 AI휴먼이 스크립트를 발화하는 영상을 자동으로 제작한 후 사용자에게 제공하면 사용자가 해당 영상의 레이아웃이나 영상에 포함된 각각의 요소를 자유롭게 조정할 수 있는 서비스를 제공할 수 있는 기술이 요구되고 있다.Meanwhile, the current video production process can be time-consuming and require specialized knowledge and practice. Therefore, non-experts may find video production very difficult. In this situation, a service that automatically produces professional video content when the user inputs simple content is needed. Therefore, when the user inputs text, AI automatically generates related images, and automatically produces a video in which an AI human speaks a script, including the images, and then provides it to the user, allowing the user to freely adjust the layout of the video or each element included in the video. This is a technology that can provide a service.

해당 분야의 종래기술로는 대한민국 등록특허 제10-2267673호와 같이, 사용자 체험형 동영상 컨텐츠 자동제작방법 및 시스템에 관한 기술이 있다. 그러나, 상기 종래기술의 경우, 사용자가 템플릿을 선택하면 사용자로부터 사진, 텍스트, 및 음악을 입력 받아 영상을 제작하여 사용자에게 제공하는 방법에 대해 시사하고 있으나, 전술한 바와 같이, AI를 통해 생성된 이미지를 포함하고 AI휴먼이 스크립트를 발화하는 영상을 자동적으로 제작하고 사용자가 자유롭게 편집할 수 있는 서비스를 제공하는 방안에 대해서는 전혀 개시하고 있지 않아, 이를 해결할 수 있는 기술이 요구되는 실정이다.As a prior art in this field, there is a technology regarding a method and system for automatically producing user-experience type video content, such as Korean Patent No. 10-2267673. However, in the case of the above-mentioned prior art, when the user selects a template, a method is suggested for producing a video by receiving photos, text, and music from the user and providing the video to the user, but as mentioned above, there is no disclosure at all about a method for automatically producing a video including an image generated through AI and an AI human speaking a script and providing a service that allows the user to freely edit it. Therefore, a technology that can solve this problem is in demand.

대한민국 등록특허 제10-2267673호 (2021.06.16.)Republic of Korea Patent No. 10-2267673 (June 16, 2021)

본 발명은 사용자 입력 기반 자동 영상 컨텐츠 제작방법으로서, 더 구체적으로는, 서비스서버가 사용자단말로부터 수신한 사용자텍스트를 거대언어모델에 입력하여 타이틀, 복수의 서브타이틀, 및 복수의 서브텍스트를 추출하여 복수의 서브타이틀 및 복수의 서브텍스트 각각에 상응하는 1 이상의 이미지를 생성한 뒤, 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지를 포함하는 복수의 슬라이드를 생성하고, 사용자단말을 통해 AI휴먼영상의 레이아웃 및 AI휴먼캐릭터가 설정되면 AI휴먼이 상기 서브텍스트를 발화하고 상기 복수의 슬라이드를포함하는 AI휴먼영상 및 AI휴먼영상 생성을 위한 편집인터페이스를 사용자단말로 제공하는, 사용자 입력 기반 자동 영상 컨텐츠 제작방법을 제공하는것을 목적으로 한다.The present invention relates to a method for automatically producing video content based on user input, and more specifically, to a method for automatically producing video content based on user input, wherein a service server inputs user text received from a user terminal into a macro language model to extract a title, a plurality of subtitles, and a plurality of subtexts, generates one or more images corresponding to each of the plurality of subtitles and the plurality of subtexts, and then generates a plurality of slides including the title, the subtitles, and the generated images, and when the layout and AI human characters of the AI human video are set through the user terminal, the AI human utters the subtexts and provides the AI human video including the plurality of slides and an editing interface for generating the AI human video to the user terminal.

상기와 같은 과제를 해결하기 위하여, 본 발명의 일 실시예에서는, 1 이상의 프로세서 및 1 이상의 메모리를 포함하는 컴퓨팅시스템에서 수행되는 사용자 입력 기반 자동 영상 컨텐츠 제작방법으로서, 사용자단말로부터 입력된 사용자텍스트에 기초하여 타이틀, 복수의 서브타이틀 및 복수의 서브텍스트를 추출하고 상기 복수의 서브타이틀 및 상기 복수의 서브텍스트와 관련된 1 이상의 생성이미지를 생성하는 데이터자동생성단계; 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지를 포함하는 복수의 슬라이드를 생성하는 슬라이드자동생성단계; 및 상기 서브텍스트를 발화하는 AI휴먼 및 상기 슬라이드를 포함하는 AI휴먼영상을 생성하고, 편집인터페이스를 사용자단말로 제공하는 AI휴먼영상생성단계;를 포함하는, 사용자 입력 기반 자동 영상 컨텐츠 제작방법을 제공한다.In order to solve the above-described problem, in one embodiment of the present invention, a user input-based automatic video content production method performed in a computing system including one or more processors and one or more memories is provided, the method including: a data automatic generation step of extracting a title, a plurality of subtitles and a plurality of subtexts based on user text input from a user terminal and generating one or more generated images related to the plurality of subtitles and the plurality of subtexts; a slide automatic generation step of generating a plurality of slides including the title, the subtitles, and the generated images; and an AI human image generation step of generating an AI human image including an AI human uttering the subtext and the slides, and providing an editing interface to the user terminal.

본 발명의 일 실시예에서는, 상기 AI휴먼영상생성단계는, AI휴먼영상에 포함되는 타이틀, 서브타이틀, 및 생성이미지의 크기 및 위치를 조정하는 레이아웃조정인터페이스를 사용자단말로 제공하는 레이아웃설정단계; 사용자의 입력에 따라 복수의 영상으로 구현될 AI휴먼캐릭터에 정보가 설정되는 AI휴먼캐릭터설정단계; 및 사용자단말로 AI휴먼영상 생성을 위한 편집인터페이스를 제공하는 편집인터페이스제공단계;를 포함할수 있다.In one embodiment of the present invention, the AI human image generation step may include a layout setting step of providing a layout adjustment interface for adjusting the size and position of a title, subtitle, and generated image included in the AI human image to a user terminal; an AI human character setting step of setting information for an AI human character to be implemented as a plurality of images according to a user input; and an editing interface providing step of providing an editing interface for generating the AI human image to the user terminal.

본 발명의 일 실시예에서는, 상기 레이아웃조정인터페이스는, 전체 동영상에 대한 타이틀을 표시하는 타이틀레이어; 각각의 슬라이드에 대한 서브타이틀을 표시하는 서브타이틀레이어; 각각의 슬라이드에 자동적으로 배치된 생성이미지를 표시하는 생성이미지레이어; 각각의 슬라이드의 서브텍스트를 발화하는 AI휴먼이 표시되는 AI휴먼레이어; 및 상기 서브텍스트를 AI휴먼영상의 자막으로 표시하는 자막레이어;를 포함하고, 상기 레이아웃조정인터페이스를 통해 타이틀레이어, 서브타이틀레이어, 생성이미지레이어, AI휴먼레이어, 및 자막레이어의 위치 및 크기를 조정할 수 있다.In one embodiment of the present invention, the layout adjustment interface includes a title layer that displays a title for the entire video; a subtitle layer that displays a subtitle for each slide; a generated image layer that displays a generated image automatically placed on each slide; an AI human layer that displays an AI human that utters a subtext of each slide; and a subtitle layer that displays the subtext as a subtitle of an AI human video; and the positions and sizes of the title layer, the subtitle layer, the generated image layer, the AI human layer, and the subtitle layer can be adjusted through the layout adjustment interface.

본 발명의 일 실시예에서는, 상기 데이터자동생성단계는, AI휴먼영상의 제목이나 주제와 관련된 사용자텍스트를 사용자단말로부터 수신하는 사용자텍스트수신단계; 상기 사용자텍스트를 서비스서버의 내부 혹은 외부의 거대언어모델에 입력하여, AI휴먼영상의 슬라이드에 포함될 타이틀, 복수의 서브타이틀 및 복수의 서브텍스트를 추출하는 서브텍스트추출단계; 및 상기 복수의 서브타이틀 및 복수의 서브텍스트 각각을 서비스서버의 내부 혹은 외부의 딥러닝 기반의 이미지생성모델에 입력하여, 상기 복수의 서브타이틀 및 복수의 서브텍스트 각각에 상응하는 1 이상의 생성이미지를 생성하는 이미지생성단계;를 포함할 수 있다.In one embodiment of the present invention, the data automatic generation step may include a user text reception step of receiving user text related to a title or subject of an AI human image from a user terminal; a subtext extraction step of inputting the user text into a large language model inside or outside a service server to extract a title, a plurality of subtitles and a plurality of subtexts to be included in a slide of the AI human image; and an image generation step of inputting each of the plurality of subtitles and the plurality of subtexts into a deep learning-based image generation model inside or outside the service server to generate one or more generated images corresponding to each of the plurality of subtitles and the plurality of subtexts.

본 발명의 일 실시예에서는, 상기 슬라이드자동생성단계는, 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지가 상이한 레이아웃으로 배치된 복수의 레이아웃템플릿을 사용자단말로 제공하고, 상기 사용자단말로부터 선택된 레이아웃템플릿의 형태로 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지를 배치하여 복수의 슬라이드를 생성할 수 있다.In one embodiment of the present invention, the automatic slide generation step can provide a plurality of layout templates in which the title, the subtitle, and the generated image are arranged in different layouts to a user terminal, and generate a plurality of slides by arranging the title, the subtitle, and the generated image in the form of a layout template selected from the user terminal.

본 발명의 일 실시예에서는, 상기 서브텍스트는 AI휴먼이 발화할 스크립트이고, 상기 편집인터페이스는, 상기 생성된 AI휴먼영상에 대한 프리뷰를 제공하는, 프리뷰레이어; 및 상기 서브텍스트에 대한 편집기능을 제공하는 스크립트레이어;를 포함하고, 상기 스크립트레이어는, 복수의 서브텍스트 각각이표시되고, 해당 서브텍스트에 대한 편집기능을 제공하는, 상기 서브텍스트 각각에 상응하는 복수의서브텍스트레이어;를 포함하고, 상기 서브텍스트레이어에는, 해당 서브텍스트레이어에 포함된 서브텍스트가 TTS로 음성으로 변환되는 경우, 음성의재생시간이 표시될 수 있다.In one embodiment of the present invention, the subtext is a script to be spoken by the AI human, the editing interface includes a preview layer that provides a preview of the generated AI human image; and a script layer that provides an editing function for the subtext; and the script layer includes a plurality of subtext layers corresponding to each of the subtexts, each of which displays a plurality of subtexts and provides an editing function for the corresponding subtexts; and in the subtext layer, when the subtext included in the corresponding subtext layer is converted into a voice using TTS, the playback time of the voice can be displayed.

본 발명의 일 실시예에서는, 상기 서브텍스트레이어에는 제스처입력레이어가 표시될수 있고, 상기 제스처입력레이어는, 해당 서브텍스트레이어에 포함된 서브텍스트에 대한 AI휴먼영상에서의 상기 AI휴먼캐릭터의 제스처 종류, 및 발현위치를 설정할수 있는 제스처설정인터페이스를 포함하고, 상기 제스처입력레이어에 의하여, 해당 서브텍스트레이어에 포함된 서브텍스트에 대한 제스처 종류 및 발현위치가 설정되는 경우에, 해당 서브텍스트에 대한 요약블록에는, 설정된 발현위치에 상응하는 요약블록의 세부위치에 해당 제스처에 대한 정보가 오버레이되어 표시될 수 있다.In one embodiment of the present invention, a gesture input layer can be displayed in the subtext layer, and the gesture input layer includes a gesture setting interface that can set the gesture type and expression location of the AI human character in the AI human image for the subtext included in the corresponding subtext layer, and when the gesture type and expression location for the subtext included in the corresponding subtext layer are set by the gesture input layer, in the summary block for the corresponding subtext, information about the corresponding gesture can be overlaid and displayed at a detailed location of the summary block corresponding to the set expression location.

본 발명의 일 실시예에서는, 상기 프리뷰레이어는, AI휴먼영상의 재생, 정지를 포함하는 재생동작과 관련된 아이콘이 디스플레이되는 프리뷰재생레이어; 상기 프리뷰재생레이어에서 표시되는 AI휴먼영상의 시계열에 따른 정보를 요약하여 표시하는 시계열요약레이어;를 포함하고, 상기 시계열요약레이어는, 포함된 조작축엘리먼트를 이동에 따라, 프리뷰재생레이어의 영상의 시점이 이동하는 타임라인레이어를 포함할 수 있다.In one embodiment of the present invention, the preview layer includes a preview playback layer that displays icons related to playback operations including playback and stopping of AI human images; a time series summary layer that summarizes and displays information according to the time series of the AI human images displayed in the preview playback layer; and the time series summary layer may include a timeline layer in which the viewpoint of the image of the preview playback layer moves according to the movement of the included manipulation axis element.

본 발명의 일 실시예에서는, 상기 편집인터페이스는, 상기 AI휴먼영상에서의 AI휴먼캐릭터의 수 및 모델을선택할 수 있는 AI휴먼캐릭터선택레이어; 및 각각의 AI휴먼캐릭터에 대한 스타일, 앵글, 포즈, 크기, 및 위치를 설정할 수 있는 AI휴먼캐릭터설정레이어;를 더 포함하고, 상기 AI휴먼캐릭터설정레이어는,상기 프리뷰재생레이어에서 재생되는 AI휴먼영상에 표시되는 AI휴먼캐릭터의 선택입력 혹은, 서브텍스트에 상응하는 요약블록에서의 선택입력에 따라 표시될 수 있다.In one embodiment of the present invention, the editing interface further includes an AI human character selection layer capable of selecting the number and model of AI human characters in the AI human image; and an AI human character setting layer capable of setting the style, angle, pose, size, and position of each AI human character; and the AI human character setting layer can be displayed according to a selection input of an AI human character displayed in an AI human image played in the preview playback layer or a selection input in a summary block corresponding to a subtext.

본 발명의 일 실시예에 따르면, 사용자는 텍스트만 입력하면 AI휴먼영상을 제공받을 수 있는 효과를 발휘할 수 있다.According to one embodiment of the present invention, a user can effectively receive an AI human image by simply inputting text.

본 발명의 일 실시예에 따르면, 사용자단말로부터 입력 받은 사용자텍스트를 거대언어모델에 입력하여 서브타이틀 및 서브텍스트를 생성할 수 있는 효과를 발휘할 수 있다.According to one embodiment of the present invention, it is possible to achieve the effect of generating subtitles and subtexts by inputting user text received from a user terminal into a large language model.

본 발명의 일 실시예에 따르면, 딥러닝 기반의 이미지생성모델을 통해 서브타이틀 및 서브텍스트 각각에 상응하는 이미지를 자동적으로 생성할 수 있는 효과를 발휘할 수 있다.According to one embodiment of the present invention, an effect of automatically generating an image corresponding to each subtitle and subtext can be achieved through a deep learning-based image generation model.

본 발명의 일 실시예에 따르면, 사용자단말로 제공되는 레이아웃템플릿을 통해 사용자는 AI휴먼영상에 포함되는 타이틀, 서브타이틀, 및 생성이미지에 대한 레이아웃을 설정할 수 있는 효과를 발휘할 수 있다.According to one embodiment of the present invention, a user can have the effect of setting a layout for a title, subtitle, and generated image included in an AI human image through a layout template provided to a user terminal.

본 발명의 일 실시예에 따르면, 레이아웃조정인터페이스를 통해 사용자는 AI휴먼영상에 포함되는 타이틀, 서브타이틀, 생성이미지, AI휴먼, 및 자막의 위치 및 크기를 편집할 수 있는 효과를 발휘할 수 있다.According to one embodiment of the present invention, a user can effectively edit the position and size of a title, subtitle, generated image, AI human, and subtitle included in an AI human video through a layout adjustment interface.

본 발명의 일 실시예에 따르면, 편집인터페이스를 통해 사용자는 AI휴먼영상에 포함되는 각각의 레이어를 편집할 수 있는 효과를 발휘할 수 있다.According to one embodiment of the present invention, a user can effectively edit each layer included in an AI human image through an editing interface.

본 발명의 일 실시예에 따르면, 서브텍스트를TTS 변환하여 AI휴먼이 서브텍스트를 발화하는 AI휴먼영상을 생성할 수 있는 효과를 발휘할 수 있다.According to one embodiment of the present invention, it is possible to achieve the effect of generating an AI human image in which an AI human speaks the subtext by converting the subtext into TTS.

본 발명의 일 실시예에 따르면, 타이틀, 서브타이틀 및 생성이미지를 포함하는 복수의 슬라이드가 자동적으로 생성되고, 편집인터페이스를 통해 사용자가 각각의 슬라이드를 편집할 수 있는 효과를 발휘할 수 있다.According to one embodiment of the present invention, a plurality of slides including a title, a subtitle, and a generated image are automatically generated, and an effect can be achieved where a user can edit each slide through an editing interface.

본 발명의 일 실시예에 따르면, 타이틀, 서브타이틀, 자막, 생성이미지, 및 AI휴먼을 포함하는 AI휴먼영상을 자동적으로 생성할 수 있는 효과를 발휘할 수 있다.According to one embodiment of the present invention, it is possible to achieve the effect of automatically generating an AI human image including a title, a subtitle, a subtitle, a generated image, and an AI human.

본 발명의 일 실시예에 따르면, AI휴먼캐릭터의 정보에 대한 사용자의 입력에 따라 AI휴먼을 생성할 수 있는 효과를 발휘할 수 있다.According to one embodiment of the present invention, it is possible to achieve the effect of generating an AI human according to a user's input regarding information on the AI human character.

본 발명의 일 실시예에 따르면, 사용자는 AI휴먼영상에서 복수의 AI휴먼캐릭터 각각에 대한 제스처를 설정할 수 있고, 해당 AI휴먼영상에 대한 요약블록에 사용자에 의해 설정된 제스처에 대한 정보가 표시될 수 있는 효과를 발휘할 수 있다.According to one embodiment of the present invention, a user can set a gesture for each of a plurality of AI human characters in an AI human image, and information about the gesture set by the user can be displayed in a summary block for the corresponding AI human image.

본 발명의 일실시예에 따르면, 사용자는 시간축상에 표시되는 엘리먼트를 이동조작하여 AI휴먼영상의 재생시점을 변경할 수 있는 효과를 발휘할 수 있다.According to one embodiment of the present invention, a user can change the playback time of an AI human image by moving an element displayed on a time axis.

본 발명의 일실시예에 따르면, 서브텍스트 각각에 대한 스크립트, 및 AI휴먼캐릭터에 대한 정보를 입력 및 수정할 수 있고, 입력 및 수정된 정보가 반영되어 재생성된 AI휴먼영상을 실시간으로 확인할 수 있는 효과를 발휘할 수 있다.According to one embodiment of the present invention, a script for each subtext and information on an AI human character can be input and modified, and an effect of checking in real time a regenerated AI human image reflecting the input and modified information can be achieved.

본 발명의 일실시예에 따르면, 사용자는 AI휴먼영상에서의 배경이미지, 배경동영상, 및 배경음원을 설정할 수 있고, 설정된 배경이 반영되어 재생성된 AI휴먼영상을 실시간으로 확인할 수 있는 효과를 발휘할 수 있다.According to one embodiment of the present invention, a user can set a background image, a background video, and a background sound source in an AI human video, and can achieve the effect of checking an AI human video regenerated with the set background reflected in real time.

도 1은 본 발명의 일 실시예에 따른 서비스서버의 내부구성을 개략적으로 도시한다.
도 2는 본 발명의 일 실시예에 따른 데이터자동생성부의 내부구성을 개략적으로 도시한다.
도 3은 본 발명의 일 실시예에 따른 데이터자동생성단계의 수행과정을 개략적으로 도시한다.
도 4는 본 발명의 일 실시예에 따른 데이터자동생성단계, 슬라이드자동생성단계, 및 AI휴먼영상생성단계의 수행과정을 개략적으로 도시한다.
도 5는 본 발명의 일 실시예에 따른 사용자 입력 기반 자동 영상 컨텐츠 제작방법의 수행과정을 개략적으로 도시한다.
도 6은 본 발명의 일 실시예에 따른 레이아웃템플릿을 개략적으로 도시한다.
도 7 내지 14는 본 발명의 일 실시예에 따른 AI휴먼영상을 편집할 수 있는 편집인터페이스를 도시한다.
도 15는 본 발명의 일 실시예에 따른 컴퓨팅장치의 내부 구성을 예시적으로 도시한다.
Figure 1 schematically illustrates the internal configuration of a service server according to one embodiment of the present invention.
Figure 2 schematically illustrates the internal configuration of an automatic data generation unit according to one embodiment of the present invention.
Figure 3 schematically illustrates the execution process of the automatic data generation step according to one embodiment of the present invention.
FIG. 4 schematically illustrates the execution process of the data automatic generation step, the slide automatic generation step, and the AI human image generation step according to one embodiment of the present invention.
FIG. 5 schematically illustrates the execution process of a method for automatically producing video content based on user input according to one embodiment of the present invention.
Figure 6 schematically illustrates a layout template according to one embodiment of the present invention.
Figures 7 to 14 illustrate an editing interface capable of editing AI human images according to one embodiment of the present invention.
FIG. 15 exemplarily illustrates the internal configuration of a computing device according to one embodiment of the present invention.

이하에서는, 다양한 실시예들 및/또는 양상들이 이제 도면들을 참조하여 개시된다. 하기 설명에서는 설명을 목적으로, 하나 이상의 양상들의 전반적 이해를 돕기 위해 다수의 구체적인 세부사항들이 개시된다. 그러나, 이러한 양상(들)은 이러한 구체적인 세부사항들 없이도 실행될 수 있다는 점 또한 본 발명의 기술 분야에서 통상의 지식을 가진 자에게 인식될 수 있을 것이다. 이후의 기재 및 첨부된 도면들은 하나 이상의 양상들의 특정한 예시적인 양상들을 상세하게 기술한다. 하지만, 이러한 양상들은 예시적인 것이고 다양한 양상들의 원리들에서의 다양한 방법들 중 일부가 이용될 수 있으며, 기술되는 설명들은 그러한 양상들 및 그들의 균등물들을 모두 포함하고자 하는 의도이다.Hereinafter, various embodiments and/or aspects are now disclosed with reference to the drawings. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a general understanding of one or more aspects. It will be recognized, however, by one skilled in the art that such aspect(s) may be practiced without these specific details. The following description and the annexed drawings set forth specific exemplary aspects of one or more aspects in detail. It should be understood, however, that these aspects are exemplary and that any of the various methods of the principles of the various aspects may be utilized, and the description is intended to encompass all such aspects and their equivalents.

또한, 제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Also, terms including ordinal numbers such as first, second, etc. may be used to describe various components, but the components are not limited by the terms. The terms are only used to distinguish one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component. The term and/or includes a combination of a plurality of related described items or any item among a plurality of related described items.

또한, 본 발명의 실시예들에서, 별도로 다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 발명의 실시예에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.In addition, in the embodiments of the present invention, unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the art to which the present invention belongs. Terms defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning they have in the context of the relevant technology, and shall not be interpreted in an ideal or overly formal meaning unless explicitly defined in the embodiments of the present invention.

이하에서 언급되는 "사용자단말"은 네트워크를 통해 서버나 타 단말에 접속할 수 있는 컴퓨터나 휴대용 단말기로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop) 등을 포함하고, 휴대용 단말기는 예를 들어, 휴대성과 이동성이 보장되는 무선 통신장치로서, 스마트폰, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet), BLE 비콘(Bluetooth Low Energy Beacon) 단말 등과 같은 모든 종류의 핸드헬드 (Handheld) 기반의 무선 통신 장치를 포함할 수 있다. 또한, "네트워크"는 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN) 또는 부가가치 통신망(Value Added Network; VAN) 등과 같은 유선네트워크나 이동 통신망(mobile radio communication network) 또는 위성 통신망 등과 같은 모든 종류의 무선 네트워크로 구현될 수 있다.The "user terminal" mentioned below can be implemented as a computer or portable terminal that can access a server or other terminal via a network. Here, the computer includes, for example, a notebook, desktop, laptop, etc. equipped with a WEB Browser, and the portable terminal can include, for example, all kinds of handheld-based wireless communication devices such as a wireless communication device that ensures portability and mobility, a smart phone, a PCS (Personal Communication System), a GSM (Global System for Mobile communications), a PDC (Personal Digital Cellular), a PHS (Personal Handyphone System), a PDA (Personal Digital Assistant), an IMT (International Mobile Telecommunication)-2000, a CDMA (Code Division Multiple Access)-2000, a W-CDMA (W-Code Division Multiple Access), a Wibro (Wireless Broadband Internet), a BLE beacon (Bluetooth Low Energy Beacon) terminal, etc. Additionally, the “network” may be implemented as a wired network such as a Local Area Network (LAN), a Wide Area Network (WAN), or a Value Added Network (VAN), or any type of wireless network such as a mobile radio communication network or a satellite communication network.

1. 사용자 입력 기반 자동 영상 컨텐츠 제작방법1. Method for automatic video content creation based on user input

이하에서는 사용자 입력 기반 자동 영상 컨텐츠 제작방법의 구성과 각각의 구성에 의하여 수행되는 수행단계들에 대해 설명하도록 한다.Below, the composition of the automatic video content production method based on user input and the execution steps performed by each composition are described.

도 1은 본 발명의 일 실시예에 따른 서비스서버(1)의 내부구성을 개략적으로 도시한다.Figure 1 schematically illustrates the internal configuration of a service server (1) according to one embodiment of the present invention.

도 1에 도시된 바와 같이, 상기 서비스서버(1)는, 사용자단말로부터 입력된 사용자텍스트에 기초하여 타이틀, 복수의 서브타이틀 및 복수의 서브텍스트를 추출하고 상기 복수의 서브타이틀 및 상기 복수의 서브텍스트와 관련된 1 이상의 생성이미지를 생성하는 데이터자동생성단계를 수행하는 데이터자동생성부(100); 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지를 포함하는 복수의 슬라이드를 생성하는 슬라이드자동생성단계를 수행하는 슬라이드자동생성부(200); 및 상기 서브텍스트를 발화하는 AI휴먼 및 상기 슬라이드를 포함하는 AI휴먼영상을 생성하고, 편집인터페이스를 사용자단말로 제공하는 AI휴먼영상생성단계를 수행하는 AI휴먼영상생성부(300);를 포함한다.As illustrated in FIG. 1, the service server (1) includes: a data automatic generation unit (100) that performs a data automatic generation step of extracting a title, a plurality of subtitles and a plurality of subtexts based on a user text input from a user terminal and generating one or more generated images related to the plurality of subtitles and the plurality of subtexts; a slide automatic generation unit (200) that performs a slide automatic generation step of generating a plurality of slides including the title, the subtitles, and the generated images; and an AI human image generation unit (300) that performs an AI human image generation step of generating an AI human image including an AI human uttering the subtext and the slide, and providing an editing interface to the user terminal.

구체적으로, 도 1에 도시된 서비스서버(1)에 포함되는 각각의 구성은, 본 발명의 1 이상의 프로세서 및 1 이상의 메모리를 포함하는 컴퓨팅시스템에서 수행되는 사용자 입력 기반 자동 영상 컨텐츠 제작방법을 수행하는 서비스서버(1)의 동작을 제어하는 역할을 수행한다.Specifically, each component included in the service server (1) illustrated in FIG. 1 serves to control the operation of the service server (1) that performs a user input-based automatic video content production method performed in a computing system including one or more processors and one or more memories of the present invention.

더 구체적으로, 상기 서비스서버(1)의 데이터자동생성부(100)는, 사용자단말로부터 입력된 사용자텍스트에 기초하여 타이틀, 복수의 서브타이틀 및 복수의 서브텍스트를 추출하고 상기 복수의 서브타이틀 및 상기 복수의 서브텍스트와 관련된 1 이상의 생성이미지를 생성할 수 있다. 본 발명의 일 실시예에서는, 상기 사용자텍스트는 AI휴먼영상의 제목이나 주제와 관련하여 사용자단말로부터 수신한 텍스트를 포함하고, 상기 타이틀은 AI휴먼영상 전체에 대한 타이틀을 포함하고, 상기 서브타이틀은 AI휴먼영상에 포함되는 복수의 슬라이드에 대하여, 각각의 슬라이드를 요약하는 서브타이틀을 포함하고, 상기 서브텍스트는 상기 각각의 슬라이드 또는 AI휴먼영상에서 AI휴먼이 발화할 스크립트 및 AI휴먼영상의 자막에 표시되는 내용을 포함하며, 상기 생성이미지는 상기 복수의 서브타이틀 및 상기 복수의 서브텍스트 각각과 관련된 이미지를 포함할 수 있다. 바람직하게는, 서브타이틀 및 서브텍스트 하나 당 상기 생성이미지는 하나씩 도출될 수 있다. 또한, 상기 데이터자동생성단계는, AI휴먼영상의 제목이나 주제와 관련된 사용자텍스트를 사용자단말로부터 수신하는 사용자텍스트수신단계; 상기 사용자텍스트를 서비스서버(1)의 내부 혹은 외부의 거대언어모델에 입력하여, AI휴먼영상의 슬라이드에 포함될 타이틀, 복수의 서브타이틀 및 복수의 서브텍스트를 추출하는 서브텍스트추출단계; 및 상기 복수의 서브타이틀 및 복수의 서브텍스트 각각을 서비스서버(1)의 내부 혹은 외부의 딥러닝 기반의 이미지생성모델에 입력하여, 상기 복수의 서브타이틀 및 복수의 서브텍스트 각각에 상응하는 1 이상의 생성이미지를 생성하는 이미지생성단계;를 포함한다.More specifically, the data automatic generation unit (100) of the service server (1) can extract a title, a plurality of subtitles, and a plurality of subtexts based on a user text input from a user terminal, and can generate one or more generated images related to the plurality of subtitles and the plurality of subtexts. In one embodiment of the present invention, the user text includes a text received from the user terminal in relation to a title or subject of an AI human image, the title includes a title for the entire AI human image, the subtitle includes a subtitle that summarizes each slide included in the AI human image, the subtext includes a script to be uttered by the AI human in each of the slides or the AI human image and content displayed in the subtitles of the AI human image, and the generated image can include an image related to each of the plurality of subtitles and the plurality of subtexts. Preferably, one generated image can be derived for each subtitle and subtext. In addition, the data automatic generation step includes a user text reception step of receiving user text related to the title or subject of the AI human image from a user terminal; a subtext extraction step of inputting the user text into a large language model inside or outside the service server (1) to extract a title, a plurality of subtitles and a plurality of subtexts to be included in a slide of the AI human image; and an image generation step of inputting each of the plurality of subtitles and the plurality of subtexts into a deep learning-based image generation model inside or outside the service server (1) to generate one or more generated images corresponding to each of the plurality of subtitles and the plurality of subtexts.

상기 서비스서버(1)의 슬라이드자동생성부(200)는, 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지를 포함하는 복수의 슬라이드를 생성할 수 있다. 본 발명의 일 실시예에서는, 상기 슬라이드자동생성단계는, 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지가 상이한 레이아웃으로 배치된 복수의 레이아웃템플릿을 사용자단말로 제공하고, 상기 사용자단말로부터 선택된 레이아웃템플릿의 형태로 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지를 배치하여 복수의 슬라이드를 생성할 수 있다.The slide automatic generation unit (200) of the above service server (1) can generate a plurality of slides including the title, the subtitle, and the generated image. In one embodiment of the present invention, the slide automatic generation step can generate a plurality of slides by providing a plurality of layout templates in which the title, the subtitle, and the generated image are arranged in different layouts to a user terminal, and arranging the title, the subtitle, and the generated image in the form of a layout template selected from the user terminal.

상기 서비스서버(1)의 AI휴먼영상생성부(300)는, 상기 서브텍스트를 발화하는 AI휴먼 및 상기 슬라이드를 포함하는 AI휴먼영상을 생성하고, 편집인터페이스를 사용자단말로 제공할 수 있다. 상기 AI휴먼영상생성부(300)는, AI휴먼영상에 포함되는 타이틀, 서브타이틀, 및 생성이미지의 크기 및 위치를 조정하는 레이아웃조정인터페이스를 사용자단말로 제공하는 레이아웃설정단계를 수행하는 레이아웃설정부(11); 사용자의 입력에 따라 복수의 영상으로 구현될 AI휴먼캐릭터에 정보가 설정되는 AI휴먼캐릭터설정단계를 수행하는 AI휴먼캐릭터설정부(12); 사용자단말로 AI휴먼영상 생성을 위한 편집인터페이스를 제공하는 편집인터페이스제공단계를 수행하는 편집인터페이스제공부(13); 및 상기 편집인터페이스를 통해 사용자단말로부터 편집된 AI휴먼영상을 영상컨텐츠로 제작하여 사용자단말로 최종AI휴먼영상을 제공하는 영상컨텐츠제작단계를 수행하는 영상컨텐츠제작부(14);를 포함한다.The AI human image generation unit (300) of the above service server (1) can generate an AI human image including an AI human uttering the subtext and the above slide, and provide an editing interface to a user terminal. The AI human image generation unit (300) includes: a layout setting unit (11) that performs a layout setting step of providing a layout adjustment interface for adjusting the size and position of a title, subtitle, and generated image included in the AI human image to the user terminal; an AI human character setting unit (12) that performs an AI human character setting step of setting information for an AI human character to be implemented as a plurality of images according to a user's input; an editing interface providing unit (13) that performs an editing interface providing step of providing an editing interface for generating an AI human image to the user terminal; And it includes a video content production unit (14) that performs a video content production step of producing an AI human video edited from a user terminal through the above editing interface into video content and providing the final AI human video to the user terminal.

본 발명의 일 실시예에서는, 상기 레이아웃조정인터페이스는, 전체 동영상에 대한 타이틀을 표시하는 타이틀레이어; 각각의 슬라이드에 대한 서브타이틀을 표시하는 서브타이틀레이어; 각각의 슬라이드에 자동적으로 배치된 생성이미지를 표시하는 생성이미지레이어; 각각의 슬라이드의 서브텍스트를 발화하는 AI휴먼이 표시되는 AI휴먼레이어; 및 상기 서브텍스트를 AI휴먼영상의 자막으로 표시하는 자막레이어;를 포함할 수 있다. 또한, 상기 편집인터페이스는, 상기 생성된AI휴먼영상에 대한 프리뷰를 제공하는, 프리뷰레이어; 및 상기 서브텍스트에 대한 편집기능을 제공하는 스크립트레이어;를 포함하고, 상기 스크립트레이어는, 복수의 서브텍스트 각각이표시되고, 해당 서브텍스트에 대한 편집기능을 제공하는, 상기 서브텍스트 각각에 상응하는 복수의서브텍스트레이어;를 포함하고, 상기 서브텍스트레이어에는, 해당 서브텍스트레이어에 포함된 서브텍스트가 TTS로 음성으로 변환되는 경우, 음성의재생시간이 표시될 수 있다.In one embodiment of the present invention, the layout adjustment interface may include a title layer that displays a title for the entire video; a subtitle layer that displays a subtitle for each slide; a generated image layer that displays a generated image automatically placed on each slide; an AI human layer that displays an AI human that speaks a subtext of each slide; and a subtitle layer that displays the subtext as a subtitle of the AI human video. In addition, the editing interface may include a preview layer that provides a preview of the generated AI human video; and a script layer that provides an editing function for the subtext; and the script layer may include a plurality of subtext layers corresponding to each of the subtexts, each of which displays a plurality of subtexts and provides an editing function for the corresponding subtexts; and in the subtext layer, when the subtext included in the corresponding subtext layer is converted into a voice using TTS, the playback time of the voice may be displayed.

본 발명의 일 실시예에서는, 상기 사용자단말은 상기 사용자 입력 기반 자동 영상 컨텐츠 제작방법과 관련된 기능을 이용하는 사용자가 사용하는 단말에해당할 수 있고, 사용자는 상기 사용자단말을 통해 서비스서버(1)에 접속하여 서비스서버(1)에서 제공하는 기능 또는 서비스를 이용할수 있다. 이 때, 상기 사용자단말은 1 이상의 프로세서 및 1 이상의 메모리를 포함하는 컴퓨팅장치로서, 노트북, 태블릿, 스마트폰, 데스크탑 중 1 이상을 포함할 수 있다. 또한, 상술한 서비스서버(1)의 구성요소들은 본 발명을 구현하기 위한 필수적인 요소로서, 이에 한정하지 않고 데이터베이스 등을 더 포함할 수 있다.In one embodiment of the present invention, the user terminal may correspond to a terminal used by a user who uses a function related to the user input-based automatic video content production method, and the user may access the service server (1) through the user terminal and use a function or service provided by the service server (1). At this time, the user terminal is a computing device including one or more processors and one or more memories, and may include one or more of a laptop, a tablet, a smartphone, and a desktop. In addition, the components of the service server (1) described above are essential elements for implementing the present invention, and may further include a database, etc., without being limited thereto.

도 2는 본 발명의 일 실시예에 따른 데이터자동생성부(100)의 내부구성을 개략적으로 도시한다.Figure 2 schematically illustrates the internal configuration of an automatic data generation unit (100) according to one embodiment of the present invention.

도 2에 도시된 바와 같이, 상기 데이터자동생성부(100)는, AI휴먼영상의 제목이나 주제와 관련된 사용자텍스트를 사용자단말로부터 수신하는 사용자텍스트수신단계를 수행하는 사용자텍스트수신부(110); 상기 사용자텍스트를 서비스서버(1)의 내부 혹은 외부의 거대언어모델에 입력하여, AI휴먼영상의 슬라이드에 포함될 타이틀, 복수의 서브타이틀 및 복수의 서브텍스트를 추출하는 서브텍스트추출단계를 수행하는 서브텍스트추출부(120); 및 상기 복수의 서브타이틀 및 복수의 서브텍스트 각각을 서비스서버(1)의 내부 혹은 외부의 딥러닝 기반의 이미지생성모델에 입력하여, 상기 복수의 서브타이틀 및 복수의 서브텍스트 각각에 상응하는 1 이상의 생성이미지를 생성하는 이미지생성단계를 수행하는 이미지생성부(130);를 포함한다.As illustrated in FIG. 2, the data automatic generation unit (100) includes a user text reception unit (110) that performs a user text reception step of receiving user text related to a title or subject of an AI human image from a user terminal; a subtext extraction unit (120) that inputs the user text into a large language model inside or outside the service server (1) and performs a subtext extraction step of extracting a title, a plurality of subtitles, and a plurality of subtexts to be included in a slide of the AI human image; and an image generation unit (130) that inputs each of the plurality of subtitles and the plurality of subtexts into a deep learning-based image generation model inside or outside the service server (1) and performs an image generation step of generating one or more generated images corresponding to each of the plurality of subtitles and the plurality of subtexts.

바람직하게는, 상기 사용자텍스트수신단계는, 사용자가 사용자단말을 통해 AI휴먼영상으로 제작하고자 하는 동영상의 제목 혹은 주제를 포함하는 사용자텍스트를 상기 서비스서버(1)로 송신하면 상기 서비스서버(1)는 상기 사용자텍스트를 수신하여 해당 사용자텍스트와 관련된 AI휴먼영상을 자동적으로 생성하여 사용자단말로 해당 AI휴먼영상 및 편집인터페이스를 제공할 수 있고, 사용자는 상기 편집인터페이스를 통해 상기 AI휴먼영상을 편집하여 최종 AI휴먼영상을 생성할 수 있다.Preferably, in the user text receiving step, when a user transmits a user text including a title or subject of a video to be produced as an AI human video through a user terminal to the service server (1), the service server (1) receives the user text, automatically generates an AI human video related to the user text, and provides the AI human video and an editing interface to the user terminal, and the user can edit the AI human video through the editing interface to generate a final AI human video.

상기 서브텍스트추출부(120)는, 사용자단말로부터 수신한 상기 사용자텍스트를 서비스서버(1)의 내부 혹은 외부의 거대언어모델에 입력하여 타이틀, 복수의 서브타이틀, 및 복수의 서브텍스트를 추출할 수 있다. 상기 타이틀 및 상기 복수의 서브타이틀은 AI휴먼영상의 슬라이드에 포함될 타이틀 및 서브타이틀에 해당할 수 있고, 상기 복수의 서브텍스트는 AI휴먼영상에 포함되는 AI휴먼이 발화할 스크립트에 해당할 수 있고, 본 발명의 일 실시예에서는, 상기 복수의 서브텍스트는 AI휴먼영상에 표시되는 자막으로 이용될 수 있다.The above subtext extraction unit (120) can input the user text received from the user terminal into a large language model inside or outside the service server (1) to extract a title, a plurality of subtitles, and a plurality of subtexts. The above title and the plurality of subtitles may correspond to a title and subtitles to be included in a slide of an AI human image, and the plurality of subtexts may correspond to a script to be spoken by an AI human included in the AI human image, and in one embodiment of the present invention, the plurality of subtexts may be used as subtitles displayed on the AI human image.

상기 이미지생성부(130)는, 상기 거대언어모델로부터 추출된 상기 복수의 서브타이틀 및 복수의 서브텍스트 각각을 서비스서버(1)의 내부 혹은 외부의 딥러닝 기반의 이미지생성모델에 입력하여 1 이상의 생성이미지를 생성할 수 있다. 상기 생성이미지는 상기 복수의 서브타이틀 및 복수의 서브텍스트 각각에 상응하는 이미지에 해당하고, 바람직하게는, 서브타이틀 및 서브텍스트 하나 당 상기 생성이미지는 하나씩 도출될 수 있고, AI휴먼영상 내에서 하나의 슬라이드에는, 하나의 내용에 대한 서브타이틀, 서브텍스트, 및 생성이미지가 각각 하나씩 포함될 수 있다.The image generation unit (130) can input each of the plurality of subtitles and the plurality of subtexts extracted from the large language model into a deep learning-based image generation model inside or outside the service server (1) to generate one or more generated images. The generated images correspond to images corresponding to each of the plurality of subtitles and the plurality of subtexts, and preferably, one generated image can be derived per subtitle and subtext, and one subtitle, one subtext, and one generated image for one content can be included in each slide in the AI human image.

도 3은 본 발명의 일 실시예에 따른 데이터자동생성단계의 수행과정을 개략적으로 도시한다.Figure 3 schematically illustrates the execution process of the automatic data generation step according to one embodiment of the present invention.

도 3에 도시된 바와 같이, 사용자텍스트를 거대언어모델에 입력하여 서브타이틀 및 서브텍스트를 추출할 수 있고, 상기 서브타이틀 및 상기 서브텍스트를 이미지생성모델에 입력하여 생성이미지를 생성할 수 있다.As illustrated in Fig. 3, user text can be input into a macro language model to extract subtitles and subtexts, and the subtitles and subtexts can be input into an image generation model to generate generated images.

구체적으로, 사용자단말로부터 수신한 사용자텍스트를 서비스서버(1)의 내부 혹은 외부의 거대언어모델에 입력하여 복수의 서브타이틀 및 복수의 서브텍스트를 추출할 수 있고, 바람직하게는, 타이틀, 복수의서브타이틀, 및 복수의 서브텍스트를 추출할 수 있다. 만약, 상기 사용자텍스트가 하나의문장일 경우에는, 해당 사용자텍스트에서 컨텐츠를 확장하여 AI휴먼영상에 포함될 내용을 추출할수 있고, 상기 사용자텍스트가 복수의 문장 또는 문서에해당할 경우에는, 해당 사용자텍스트에 포함되는 내용을요약하거나 정리하여 AI휴먼영상에 포함될 내용을 추출할 수 있다. 본 발명의 일 실시예에서는, 상기 서비스서버(1)가 상기 거대언어모델에 상기 사용자텍스트를 입력할 때, AI휴먼영상의 타이틀, AI휴먼영상에 포함되는 각각의 슬라이드의 서브타이틀 및 상기 각각의 슬라이드에 대하여 AI휴먼이 발화할 서브텍스트를 각각 추출할 것을 요청할 수 있다.Specifically, the user text received from the user terminal can be input into a macro language model inside or outside the service server (1) to extract a plurality of subtitles and a plurality of subtexts, and preferably, a title, a plurality of subtitles, and a plurality of subtexts can be extracted. If the user text is one sentence, the content can be expanded from the user text to extract the content to be included in the AI human image, and if the user text corresponds to a plurality of sentences or documents, the content included in the user text can be summarized or organized to extract the content to be included in the AI human image. In one embodiment of the present invention, when the service server (1) inputs the user text into the macro language model, it can request extraction of the title of the AI human image, the subtitle of each slide included in the AI human image, and the subtext to be uttered by the AI human for each slide.

또한, 상기 거대언어모델을 통해 추출된 상기 복수의 서브타이틀 및 상기 복수의 서브텍스트 각각을 서비스서버(1)의 내부 혹은 외부의 딥러닝 기반의 이미지생성모델에 입력하여 상기 복수의 서브타이틀 및 상기 복수의 서브텍스트 각각에 상응하는 1 이상의 생성이미지를 생성할 수 있다. 바람직하게는, 서브타이틀 및 서브텍스트 하나 당 상기 생성이미지는 하나씩 도출될 수 있고, 본 발명의 일 실시예에서는, 상기 서비스서버(1)가 상기 이미지생성모델에 상기 서브타이틀 및 상기 서브텍스트 각각을 입력할 때, 서브타이틀 및 서브텍스트 하나 당 하나의 이미지를 생성할 것을 요청할 수 있다. 상기 거대언어모델을 통해 추출된 상기 타이틀, 상기 복수의 서브타이틀, 및 상기 복수의 서브텍스트와 상기 이미지생성모델을 통해 생성된 상기 1 이상의 생성이미지는 복수의 슬라이드에 각각 포함되어 AI휴먼영상에서 표시될 수 있다. 따라서, 데이터자동생성부(100)는 상기 거대언어모델 및 상기 이미지생성모델을 통해 데이터를 자동적으로 생성할 수 있다.In addition, each of the plurality of subtitles and each of the plurality of subtexts extracted through the macro language model may be input into a deep learning-based image generation model inside or outside the service server (1) to generate one or more generated images corresponding to each of the plurality of subtitles and each of the plurality of subtexts. Preferably, one generated image may be derived per subtitle and each of the subtexts, and in one embodiment of the present invention, when the service server (1) inputs each of the subtitles and each of the subtexts into the image generation model, it may request generation of one image per subtitle and each of the subtexts. The title, the plurality of subtitles, and the plurality of subtexts extracted through the macro language model and the one or more generated images generated through the image generation model may be included in each of a plurality of slides and displayed in the AI human image. Therefore, the data automatic generation unit (100) may automatically generate data through the macro language model and the image generation model.

만약, AI휴먼영상에 포함되는 데이터를 자동적으로 생성할 수 없다면, 사용자는영상 컨텐츠에 포함될 타이틀, 스크립트, 및 이미지를 직접 입력해야 하는 어려움이 있을 수 있다. 하지만, 본 발명의 사용자 입력 기반 자동 영상 컨텐츠 제작방법에서는, 데이터자동생성단계를 통해 AI휴먼영상의 타이틀, 복수의 서브타이틀, 복수의 서브텍스트, 및 1 이상의 이미지를 자동적으로 생성할 수 있기 때문에, 사용자는 제작하고자 하는 영상 컨텐츠에 대한 간단한 주제만 입력하더라도 보다 퀄리티가 높은 영상 컨텐츠를 제공받을 수 있다.If the data included in the AI human video cannot be automatically generated, the user may have difficulty in directly inputting the title, script, and image to be included in the video content. However, in the automatic video content production method based on user input of the present invention, since the title, multiple subtitles, multiple subtexts, and one or more images of the AI human video can be automatically generated through the data automatic generation step, the user can be provided with higher quality video content even if he or she inputs only a simple topic regarding the video content to be produced.

도 4는 본 발명의 일 실시예에 따른 데이터자동생성단계, 슬라이드자동생성단계, 및 AI휴먼영상생성단계의 수행과정을 개략적으로 도시한다.FIG. 4 schematically illustrates the execution process of the data automatic generation step, the slide automatic generation step, and the AI human image generation step according to one embodiment of the present invention.

개략적으로, 도 4의 (a)는 데이터자동생성단계를 수행하는 데이터자동생성부(100)의 수행과정을 도시하고, 도 4의 (b)는 슬라이드자동생성단계를 수행하는 슬라이드자동생성부(200)의 수행과정을 도시하고, 도 4의 (c)는 AI휴먼영상생성단계를 수행하는 AI휴먼영상생성부(300)의 수행과정을 도시한다.In outline, (a) of FIG. 4 illustrates the execution process of the data automatic generation unit (100) that performs the data automatic generation step, (b) of FIG. 4 illustrates the execution process of the slide automatic generation unit (200) that performs the slide automatic generation step, and (c) of FIG. 4 illustrates the execution process of the AI human image generation unit (300) that performs the AI human image generation step.

구체적으로, 도 4의 (a)에 도시된 데이터자동생성부(100)는, 사용자단말로부터 입력된 사용자텍스트에 기초하여 타이틀, 복수의 서브타이틀 및 복수의 서브텍스트를 추출하고 상기 복수의 서브타이틀 및 상기 복수의 서브텍스트와 관련된 1 이상의 생성이미지를 생성하는 데이터자동생성단계를 수행한다. 사용자단말로부터 사용자텍스트를 수신하면 상기 데이터자동생성부(100)는 AI휴먼영상에 포함될 타이틀, 복수의 서브타이틀, 복수의 서브텍스트, 및 1 이상의 생성이미지를 자동적으로 생성할 수 있다.Specifically, the data automatic generation unit (100) illustrated in (a) of FIG. 4 performs a data automatic generation step of extracting a title, a plurality of subtitles, and a plurality of subtexts based on a user text input from a user terminal, and generating one or more generated images related to the plurality of subtitles and the plurality of subtexts. When receiving a user text from a user terminal, the data automatic generation unit (100) can automatically generate a title, a plurality of subtitles, a plurality of subtexts, and one or more generated images to be included in an AI human image.

더 구체적으로, 상기 사용자텍스트를 서비스서버(1)의 내부 혹은 외부의 거대언어모델에 입력하여, 상기 타이틀, 상기 복수의 서브타이틀 및 상기 복수의 서브텍스트를 추출할 수 있고, 상기 복수의 서브타이틀 및 복수의 서브텍스트 각각을 서비스서버(1)의 내부 혹은 외부의 딥러닝 기반의 이미지생성모델에 입력하여, 상기 복수의 서브타이틀 및 복수의 서브텍스트 각각에 상응하는 1 이상의 생성이미지를 생성할 수 있다.More specifically, the user text can be input into a large language model inside or outside the service server (1) to extract the title, the plurality of subtitles and the plurality of subtexts, and each of the plurality of subtitles and the plurality of subtexts can be input into a deep learning-based image generation model inside or outside the service server (1) to generate one or more generated images corresponding to each of the plurality of subtitles and the plurality of subtexts.

도 4의 (b)에 도시된 슬라이드자동생성부(200)는, 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지를 포함하는 복수의 슬라이드를 생성하는 슬라이드자동생성단계를 수행한다. 상기 슬라이드자동생성부(200)는 상기 데이터자동생성부(100)에서 생성된 상기 타이틀, 상기 복수의 서브타이틀, 및 상기 1 이상의 생성이미지를 포함하는 복수의 슬라이드를 자동적으로 생성할 수 있다.The slide automatic generation unit (200) illustrated in (b) of Fig. 4 performs a slide automatic generation step of generating a plurality of slides including the title, the subtitle, and the generated image. The slide automatic generation unit (200) can automatically generate a plurality of slides including the title generated by the data automatic generation unit (100), the plurality of subtitles, and the one or more generated images.

더 구체적으로, 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지가 상이한 레이아웃으로 배치된 복수의 레이아웃템플릿을 사용자단말로 제공하고, 상기 사용자단말로부터 선택된 레이아웃템플릿의 형태로 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지를 배치하여 복수의 슬라이드를 생성할 수 있다. 바람직하게는, 상기 슬라이드자동생성부(200)에서 상기 레이아웃템플릿을 통해 슬라이드에 배치된 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지 각각은 이후 AI휴먼영상생성단계에 포함되는 레이아웃설정단계에 사용자단말로 제공되는 레이아웃조정인터페이스를 통해 사용자가 직접 위치 및 크기를 조정할 수 있다.More specifically, a plurality of layout templates in which the title, the subtitle, and the generated image are arranged in different layouts are provided to the user terminal, and the title, the subtitle, and the generated image are arranged in the form of the layout template selected from the user terminal to generate a plurality of slides. Preferably, the title, the subtitle, and the generated image arranged in the slide through the layout template in the slide automatic generation unit (200) can be directly adjusted by the user through a layout adjustment interface provided to the user terminal in the layout setting step included in the subsequent AI human image generation step.

바람직하게는, 상기 슬라이드는 AI휴먼영상에서 화면이 전환되면서 화면에 표시되는 서브타이틀 및 생성이미지가 변화하는 각각의 슬라이드를 포함할 수 있고, 각각의 슬라이드에는 서브타이틀, 생성이미지, 및 AI휴먼이 포함되어 해당 서브타이틀과 관련된 생성이미지가 표시될 수 있고 해당 서브타이틀과 관련된 스크립트를 AI휴먼이 발화할 수 있다. 또한, 해당 AI휴먼이 발화하는 스크립트의 내용을 그대로 화면에 표시하여 AI휴먼영상의 자막으로 활용할 수 있다.Preferably, the slide may include each slide in which the subtitle and generated image displayed on the screen change as the screen changes in the AI human video, and each slide may include a subtitle, a generated image, and an AI human, so that the generated image related to the subtitle may be displayed, and the AI human may speak a script related to the subtitle. In addition, the content of the script spoken by the AI human may be displayed as is on the screen and utilized as a subtitle of the AI human video.

도 4의 (c)에 도시된 AI휴먼영상생성부(300)는, 상기 서브텍스트를 발화하는 AI휴먼 및 상기 슬라이드를 포함하는 AI휴먼영상을 생성하고, 편집인터페이스를 사용자단말로 제공하는 AI휴먼영상생성단계를 수행한다. 상기 AI휴먼영상생성부(300)는 상기 슬라이드자동생성부(200)에서 생성된 복수의 슬라이드를 포함하는 AI휴먼영상을 생성할 수 있다.The AI human image generation unit (300) illustrated in (c) of Fig. 4 generates an AI human image including an AI human uttering the subtext and the slide, and performs an AI human image generation step of providing an editing interface to a user terminal. The AI human image generation unit (300) can generate an AI human image including a plurality of slides generated by the slide automatic generation unit (200).

더 구체적으로, AI휴먼영상에 포함되는 타이틀, 서브타이틀, 및 생성이미지의 크기 및 위치를 조정하는 레이아웃조정인터페이스를 사용자단말로 제공하면 사용자는 상기 레이아웃템플릿에 따라 배치된 상기 슬라이드의 타이틀, 서브타이틀, 및 생성이미지의 크기 및 위치를 조정할 수 있다. 또한, 사용자의 입력에 따라 AI휴먼캐릭터에 대한 정보가 설정되어 AI휴먼영상에 포함될 AI휴먼이 생성될 수 있고, 사용자단말로 AI휴먼영상 생성을 위한 편집인터페이스를 제공하여 사용자는 AI휴먼영상의 각각의 레이어에 대한 편집이 가능할 수 있고, 편집인터페이스를 통해 사용자단말로부터 편집된 최종AI휴먼영상이 생성될수 있다.More specifically, by providing a layout adjustment interface for adjusting the size and position of the title, subtitle, and generated image included in the AI human image to the user terminal, the user can adjust the size and position of the title, subtitle, and generated image of the slide arranged according to the layout template. In addition, information on the AI human character is set according to the user's input, so that an AI human to be included in the AI human image can be created, and an editing interface for creating the AI human image is provided to the user terminal, so that the user can edit each layer of the AI human image, and a final edited AI human image can be created from the user terminal through the editing interface.

본 발명의 일 실시예에서는, 상기 편집인터페이스는, 상기 생성된 AI휴먼영상에 대한 프리뷰를 제공하는, 프리뷰레이어; 및 상기 서브텍스트에 대한 편집기능을 제공하는 스크립트레이어;를 포함한다. 상기 스크립트레이어는, 복수의 서브텍스트 각각이표시되고, 해당 서브텍스트에 대한 편집기능을 제공하는, 상기 서브텍스트 각각에 상응하는 복수의서브텍스트레이어;를 포함하고, 상기 서브텍스트레이어에는, 해당 서브텍스트레이어에 포함된 서브텍스트가 TTS로 음성으로 변환되는 경우, 음성의재생시간이 표시될 수 있다. 또한, 상기 서브텍스트레이어에는 제스처입력레이어가 표시될 수 있고, 상기 제스처입력레이어는, 해당 서브텍스트레이어에 포함된 서브텍스트에 대한 AI휴먼영상에서의 상기 AI휴먼캐릭터의 제스처 종류, 및 발현위치를 설정할수 있는 제스처설정인터페이스를 포함할수 있으며, 상기 제스처입력레이어에 의하여, 해당 서브텍스트레이어에 포함된 서브텍스트에 대한 제스처 종류 및 발현위치가 설정되는 경우에, 해당 서브텍스트에 대한 요약블록에는, 설정된발현위치에 상응하는 요약블록의 세부위치에 해당 제스처에 대한 정보가 오버레이되어 표시될수 있다. 상기 프리뷰레이어는, AI휴먼영상의 재생, 정지를 포함하는 재생동작과 관련된 아이콘이 디스플레이되는 프리뷰재생레이어; 상기 프리뷰재생레이어에서 표시되는 AI휴먼영상의 시계열에 따른 정보를 요약하여 표시하는 시계열요약레이어;를 포함하고, 상기 시계열요약레이어는, 포함된 조작축엘리먼트를 이동에 따라, 프리뷰재생레이어의 영상의 시점이 이동하는 타임라인레이어를 포함할 수 있다. 또한, 상기 편집인터페이스는, 상기 AI휴먼영상에서의 AI휴먼캐릭터의 수 및 모델을 선택할 수 있는 AI휴먼캐릭터선택레이어; 및 각각의 AI휴먼캐릭터에 대한 스타일, 앵글, 포즈, 크기, 및 위치를 설정할수 있는 AI휴먼캐릭터설정레이어;를 더 포함할수 있다. 상기 AI휴먼캐릭터설정레이어는, 상기 프리뷰재생레이어에서 재생되는 AI휴먼영상에 표시되는 AI휴먼캐릭터의 선택입력 혹은, 서브텍스트에 상응하는 요약블록에서의선택입력에 따라 표시될 수 있다.In one embodiment of the present invention, the editing interface includes a preview layer that provides a preview of the generated AI human image; and a script layer that provides an editing function for the subtext. The script layer includes a plurality of subtext layers corresponding to each of the subtexts, each of which displays a plurality of subtexts and provides an editing function for the corresponding subtexts; and in the subtext layer, when the subtext included in the corresponding subtext layer is converted into a voice using TTS, the playback time of the voice can be displayed. In addition, a gesture input layer may be displayed in the subtext layer, and the gesture input layer may include a gesture setting interface capable of setting the gesture type and expression location of the AI human character in the AI human image for the subtext included in the corresponding subtext layer, and when the gesture type and expression location for the subtext included in the corresponding subtext layer are set by the gesture input layer, information about the gesture may be overlaid and displayed in the detailed location of the summary block corresponding to the set expression location in the summary block for the corresponding subtext. The preview layer may include a preview playback layer in which icons related to playback operations including playback and stop of the AI human image are displayed; a time series summary layer in which information according to the time series of the AI human image displayed in the preview playback layer is summarized and displayed; and the time series summary layer may include a timeline layer in which the viewpoint of the image of the preview playback layer moves as the included manipulation axis element is moved. In addition, the editing interface may further include an AI human character selection layer capable of selecting the number and model of AI human characters in the AI human video; and an AI human character setting layer capable of setting the style, angle, pose, size, and position of each AI human character. The AI human character setting layer may be displayed according to a selection input of an AI human character displayed in an AI human video played in the preview playback layer, or a selection input in a summary block corresponding to a subtext.

도 5는 본 발명의 일 실시예에 따른 사용자 입력 기반 자동 영상 컨텐츠 제작방법의 수행과정을 개략적으로 도시한다.FIG. 5 schematically illustrates the execution process of a method for automatically producing video content based on user input according to one embodiment of the present invention.

도 5에 도시된 바와 같이, 상기 서비스서버(1)는 사용자단말로부터 사용자텍스트를 수신하여 AI휴먼영상을 생성한 뒤 다시 사용자단말로 해당 AI휴먼영상을 제공할 수 있다.As shown in Fig. 5, the service server (1) can receive user text from a user terminal, generate an AI human image, and then provide the AI human image to the user terminal again.

구체적으로, 서비스서버(1)는 사용자단말로부터 수신한 사용자텍스트를 서비스서버(1)의 내부 혹은 외부의 거대언어모델에 입력하여 서브타이틀 및 서브텍스트를 추출할 수 있다. 이 때, 본 발명의 일 실시예에서는, AI휴먼영상의 타이틀도 함께 추출할 수 있다. 이후, 상기 서브타이틀 및 상기 서브텍스트를 다시 이미지생성모델에 입력하여, 상기 서브타이틀 및 상기 서브텍스트 각각에 상응하는 1 이상의 생성이미지를 생성할 수 있다. 이 때, 본 발명의 일 실시예에서는, 상기 이미지생성모델은 서비스서버(1)의 내부 혹은 외부의 딥러닝 기반의 이미지생성모델에 해당할 수 있다. 이후, 상기 서브타이틀 및 상기 생성이미지를 포함하는 복수의 슬라이드를 생성하여, 해당 복수의 슬라이드에 기초한 AI휴먼영상을 생성할 수 있다. 이 때, 상기 복수의 슬라이드를 생성하는 단계에서, 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지가 상이한 레이아웃으로 배치된 복수의 레이아웃템플릿을 사용자단말로 제공한 뒤, 사용자단말로부터 선택된 레이아웃템플릿의 형태로 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지를 배치하여 복수의 슬라이드를 생성할 수 있다. 또한, 상기 AI휴먼영상을 생성하는 단계에서, AI휴먼영상에 포함되는 타이틀, 서브타이틀, 및 생성이미지의 크기 및 위치를 조정하는 레이아웃조정인터페이스를 사용자단말로 제공한 뒤, 사용자단말로부터 조정된 레이아웃의 형태로 상기 복수의 슬라이드가 수정될 수 있고, 사용자단말의 입력에 따라 AI휴먼캐릭터의 정보가 설정되면 해당 AI휴먼이 생성되어 각각의 슬라이드에 포함될 수 있다. 수정된 복수의 슬라이드 및 상기 AI휴먼에 기초하여 AI휴먼영상을 생성한 뒤, 해당 AI휴먼영상 및 AI휴먼영상 생성을 위한 편집인터페이스를 사용자단말로 제공하면, 사용자단말에서 AI휴먼영상에 대한 편집을 수행할 수 있고, 상기 편집인터페이스를 통해 사용자단말로부터 편집된 AI휴먼영상을 영상컨텐츠로 제작하여 사용자단말로 최종AI휴먼영상을 제공할 수 있다.Specifically, the service server (1) can input the user text received from the user terminal into a large language model inside or outside the service server (1) to extract a subtitle and a subtext. At this time, in one embodiment of the present invention, the title of the AI human image can also be extracted. Thereafter, the subtitle and the subtext can be input again into an image generation model to generate one or more generated images corresponding to each of the subtitle and the subtext. At this time, in one embodiment of the present invention, the image generation model can correspond to a deep learning-based image generation model inside or outside the service server (1). Thereafter, a plurality of slides including the subtitle and the generated images can be generated, and an AI human image based on the plurality of slides can be generated. At this time, in the step of generating the plurality of slides, a plurality of layout templates in which the title, the subtitle, and the generated image are arranged in different layouts may be provided to the user terminal, and the title, the subtitle, and the generated image may be arranged in the form of the layout template selected from the user terminal to generate the plurality of slides. In addition, in the step of generating the AI human image, a layout adjustment interface for adjusting the size and position of the title, subtitle, and generated image included in the AI human image may be provided to the user terminal, and the plurality of slides may be modified in the form of the adjusted layout from the user terminal, and when information of the AI human character is set according to the input of the user terminal, the corresponding AI human may be generated and included in each slide. After an AI human video is created based on a plurality of modified slides and the AI human, the AI human video and an editing interface for creating the AI human video are provided to a user terminal, so that editing of the AI human video can be performed on the user terminal, and the edited AI human video can be produced as video content from the user terminal through the editing interface, so that the final AI human video can be provided to the user terminal.

도 6은 본 발명의 일 실시예에 따른 레이아웃템플릿을 개략적으로 도시한다.Figure 6 schematically illustrates a layout template according to one embodiment of the present invention.

개략적으로, 도 6의 (a), (b), (c), 및 (d)는 레이아웃템플릿의 예시를 도시한다.Schematically, (a), (b), (c), and (d) of FIG. 6 illustrate examples of layout templates.

구체적으로, 상기 레이아웃템플릿은, 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지가 상이한 레이아웃으로 배치된 복수의 레이아웃템플릿을 포함하고, 레이아웃템플릿이 사용자단말로 제공된 후, 상기 사용자단말로부터 선택된 레이아웃템플릿의 형태로 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지를 배치하여 복수의 슬라이드를 생성할 수 있다.Specifically, the layout template includes a plurality of layout templates in which the title, the subtitle, and the generated image are arranged in different layouts, and after the layout template is provided to a user terminal, the title, the subtitle, and the generated image can be arranged in the form of a layout template selected from the user terminal to generate a plurality of slides.

도 6에 도시된 바와 같이, 도 6의 (a), (b), 및 (c)는 화면이 가로 방향으로 긴 형태의 레이아웃템플릿에 해당하고, 도 6의 (d)는 화면이 세로 방향으로 긴 형태의 레이아웃템플릿에 해당한다. 바람직하게는, 상기 레이아웃템플릿에는 타이틀, 서브타이틀, 생성이미지, AI휴먼, 및 자막이 포함될 수 있지만, 본 발명의 일 실시예에서는, 도 6의 (c)와 같이, AI휴먼이 표시되지 않고 생성이미지가 AI휴먼영상의 배경이미지가 되어 표시될 수 있으며, 도 6의 (d)와 같이, 생성이미지가 표시되지 않고 스크립트를 발화하는 AI휴먼이 AI휴먼영상의 가운데에서 발표자의 역할을 할 수도 있다. 또한, AI휴먼영상에서는 AI휴먼 및 생성이미지 뿐만 아니라 타이틀, 서브타이틀, 또는 자막이 표시되지 않을 수 있다.As illustrated in FIG. 6, (a), (b), and (c) of FIG. 6 correspond to layout templates in which the screen is long in the horizontal direction, and (d) of FIG. 6 corresponds to a layout template in which the screen is long in the vertical direction. Preferably, the layout template may include a title, a subtitle, a generated image, an AI human, and a subtitle. However, in one embodiment of the present invention, as in FIG. 6 (c), the AI human may not be displayed and the generated image may be displayed as a background image of the AI human video, and as in FIG. 6 (d), the generated image may not be displayed and an AI human speaking a script may act as a presenter in the center of the AI human video. In addition, in the AI human video, not only the AI human and the generated image, but also the title, subtitle, or subtitle may not be displayed.

본 발명의 일 실시예에서는, 레이아웃조정인터페이스는, 도 6에 도시된 레이아웃템플릿과 유사한 형태로 사용자단말로 제공될 수 있다. 상기 레이아웃조정인터페이스는 전체 동영상에 대한 타이틀을 표시하는 타이틀레이어; 각각의 슬라이드에 대한 서브타이틀을 표시하는 서브타이틀레이어; 각각의 슬라이드에 자동적으로 배치된 생성이미지를 표시하는 생성이미지레이어; 각각의 슬라이드의 서브텍스트를 발화하는 AI휴먼이 표시되는 AI휴먼레이어; 및 상기 서브텍스트를 AI휴먼영상의 자막으로 표시하는 자막레이어;를 포함하고, 상기 레이아웃조정인터페이스를 통해 타이틀레이어, 서브타이틀레이어, 생성이미지레이어, AI휴먼레이어, 및 자막레이어의 위치 및 크기를 조정할 수 있다. 예를 들어, 도 6의 (a)를 참고하여, 레이아웃조정인터페이스에서 타이틀은 타이틀레이어, 서브타이틀은 서브타이틀레이어, 생성이미지는 생성이미지레이어, AI휴먼은 AI휴먼레이어, 자막은 자막레이어에 해당할 수 있다.In one embodiment of the present invention, a layout adjustment interface may be provided to a user terminal in a form similar to the layout template illustrated in FIG. 6. The layout adjustment interface includes a title layer for displaying a title for the entire video; a subtitle layer for displaying a subtitle for each slide; a generated image layer for displaying a generated image automatically arranged for each slide; an AI human layer for displaying an AI human that utters a subtext of each slide; and a subtitle layer for displaying the subtext as a subtitle of an AI human video; and the positions and sizes of the title layer, the subtitle layer, the generated image layer, the AI human layer, and the subtitle layer may be adjusted through the layout adjustment interface. For example, referring to FIG. 6 (a), in the layout adjustment interface, a title may correspond to a title layer, a subtitle may correspond to a subtitle layer, a generated image may correspond to a generated image layer, an AI human may correspond to an AI human layer, and a subtitle may correspond to a subtitle layer.

바람직하게는, 타이틀, 서브타이틀, 생성이미지, AI휴먼, 및 자막이 표시되는 AI휴먼영상의 레이아웃템플릿은 도 6의 (a), (b), (c), 및 (d) 뿐만 아니라 다양한 형태로 제작되어 사용자단말로 제공될 수 있고, 사용자단말에서는 레이아웃템플릿을 통해 타이틀, 서브타이틀, 생성이미지, AI휴먼, 및 자막의 배치를 설정한 뒤, 이후 제공되는 레이아웃조정인터페이스를 통해 다시 타이틀, 서브타이틀, 생성이미지, AI휴먼, 및 자막의 크기 및 형태를 조정할 수 있다.Preferably, a layout template of an AI human image in which a title, a subtitle, a generated image, an AI human, and subtitles are displayed can be produced in various forms as well as in (a), (b), (c), and (d) of Fig. 6 and provided to a user terminal, and on the user terminal, after setting the arrangement of the title, subtitle, generated image, AI human, and subtitle through the layout template, the size and form of the title, subtitle, generated image, AI human, and subtitle can be adjusted again through the provided layout adjustment interface.

2. AI휴먼영상생성 인터페이스 제공방법2. Method for providing AI human image generation interface

전술한 도 1 내지 도 6에 대한 설명에서는 사용자 입력 기반 자동 영상 컨텐츠 제작방법의 구성과 각각의 구성에 의하여 수행되는수행단계들에 대해 서술하였다. 이하에서는 AI휴먼영상생성 인터페이스 제공방법을 통해 상기 서비스서버(1)에 포함되는 편집인터페이스제공부(13)에서 사용자단말로 제공하는 편집인터페이스의 실시예에 대해 상세히 설명하도록 한다.In the description of the above-mentioned drawings 1 to 6, the configuration of the automatic video content production method based on user input and the execution steps performed by each configuration are described. Hereinafter, an embodiment of an editing interface provided to a user terminal by the editing interface provision unit (13) included in the service server (1) through the AI human video production interface provision method will be described in detail.

도 7 내지 14는 본 발명의 일 실시예에 따른 AI휴먼영상을 편집할 수 있는 편집인터페이스를 도시한다.Figures 7 to 14 illustrate an editing interface capable of editing AI human images according to one embodiment of the present invention.

도 7 내지 14에 도시된 바와 같이 상기 편집인터페이스는, 상기 생성된 AI휴먼영상에 대한 프리뷰를 제공하는, 프리뷰레이어(L1); 및 상기 서브텍스트에 대한 편집기능을 제공하는 스크립트레이어(L2);를 포함하고, 상기 스크립트레이어(L2)는, 분할한 복수의 상기 서브텍스트 각각이 표시되고, 해당 서브텍스트에 대한 편집기능을 제공하는, 상기 서브텍스트 각각에 상응하는 복수의 서브텍스트레이어(L3);를 포함하고, 상기 서브텍스트레이어(L3)에는, 해당 서브텍스트레이어(L3)에 포함된 서브텍스트가 TTS로 음성으로 변환되는 경우, 음성의 재생시간이 표시될 수 있다.As illustrated in FIGS. 7 to 14, the editing interface includes a preview layer (L1) that provides a preview of the generated AI human image; and a script layer (L2) that provides an editing function for the subtext; and the script layer (L2) includes a plurality of subtext layers (L3) corresponding to each of the subtexts, in which each of the plurality of divided subtexts is displayed and provides an editing function for the corresponding subtext; and in the subtext layer (L3), when the subtext included in the corresponding subtext layer (L3) is converted into voice by TTS, the playback time of the voice can be displayed.

또한 상기 편집인터페이스는, 상기 AI휴먼영상에서의 AI휴먼캐릭터의 수 및 모델을 선택할 수 있는 AI휴먼캐릭터선택레이어(L6); 및 각각의 AI휴먼캐릭터에 대한 스타일, 앵글, 포즈, 크기, 및 위치를 설정할 수 있는 AI휴먼캐릭터설정레이어(L7);를 더 포함하고, 상기 AI휴먼캐릭터설정레이어(L7)는, 상기 프리뷰재생레이어(L8)에서 재생되는 AI휴먼영상에 표시되는 AI휴먼캐릭터의 선택입력 혹은, 서브텍스트에 상응하는 요약블록에서의 선택입력에 따라 표시될 수 있다.In addition, the above editing interface further includes an AI human character selection layer (L6) capable of selecting the number and model of AI human characters in the AI human video; and an AI human character setting layer (L7) capable of setting the style, angle, pose, size, and position of each AI human character; and the AI human character setting layer (L7) can be displayed according to a selection input of an AI human character displayed in an AI human video played in the preview playback layer (L8) or a selection input in a summary block corresponding to a subtext.

또한 상기 서브텍스트레이어(L3)에는 제스처입력레이어가 표시될 수 있고, 상기 제스처입력레이어는, 해당 서브텍스트레이어(L3)에 포함된 서브텍스트에 대한 AI휴먼영상에서의 상기 AI휴먼캐릭터의 제스처 종류, 및 발현위치를 설정할 수 있는 제스처설정인터페이스를 포함할 수 있다.In addition, a gesture input layer may be displayed in the subtext layer (L3), and the gesture input layer may include a gesture setting interface that can set the gesture type and expression location of the AI human character in the AI human image for the subtext included in the subtext layer (L3).

또한 상기 프리뷰레이어(L1)는 AI휴먼영상의 재생, 정지를 포함하는 재생동작과 관련된 아이콘이 디스플레이되는 프리뷰재생레이어(L8); 상기 프리뷰재생레이어(L8)에서 표시되는 AI휴먼영상의 시계열에 따른 정보를 요약하여 표시하는 시계열요약레이어(L9);를 포함하고, 상기 시계열요약레이어(L9)는, 포함된 조작축엘리먼트를 이동에 따라, 프리뷰재생레이어(L8)의 영상의 시점이 이동하는 타임라인레이어(L10)를 포함할 수 있다.In addition, the preview layer (L1) includes a preview playback layer (L8) that displays icons related to playback operations including playback and stop of AI human images; a time series summary layer (L9) that summarizes and displays information according to the time series of the AI human images displayed in the preview playback layer (L8); and the time series summary layer (L9) may include a timeline layer (L10) in which the viewpoint of the image of the preview playback layer (L8) moves as the included manipulation axis element moves.

또한 상기 시계열요약레이어(L9)는, 시계열에 따라 해당 시점에서의 AI휴먼영상에 대한 정보를 표시하는 스크립트요약정보레이어를 포함하고, 상기 스크립트요약정보레이어는, 각각의 서브텍스트에 상응하는 각각의 요약블록을 포함하고, 상기 각각의 요약블록은, 해당 서브텍스트에 대한 AI휴먼캐릭터, 및 스크립트에 대한 정보를 포함할 수 있다.In addition, the time series summary layer (L9) includes a script summary information layer that displays information about the AI human image at a corresponding point in time according to the time series, and the script summary information layer includes each summary block corresponding to each subtext, and each summary block may include information about the AI human character and script for the corresponding subtext.

또한 상기 제스처입력레이어에 의하여, 해당 서브텍스트레이어(L3)에 포함된 서브텍스트에 대한 제스처 종류 및 발현위치가 설정되는 경우에, 해당 서브텍스트에 대한 요약블록에는, 설정된 발현위치에 상응하는 요약블록의 세부위치에 해당 제스처에 대한 정보가 오버레이되어 표시될 수 있다.In addition, when the gesture type and expression location for the subtext included in the corresponding subtext layer (L3) are set by the gesture input layer, information about the corresponding gesture can be overlaid and displayed in the summary block for the corresponding subtext at the detailed location of the summary block corresponding to the set expression location.

또한 상기 편집인터페이스는, 상기 AI휴먼영상의 배경이미지, 배경동영상, 배경음원 중 1 이상을 배경정보를 설정할 수 있는 배경설정레이어를 더 포함하고, 상기 프리뷰재생레이어(L8)에서는, 상기 배경설정레이어에 의해 설정된 배경정보가 반영된 AI휴먼영상이 표시될 수 있다.In addition, the above editing interface further includes a background setting layer capable of setting background information for at least one of a background image, a background video, and a background sound source of the AI human video, and in the preview playback layer (L8), an AI human video with background information set by the background setting layer reflected can be displayed.

도 7에 도시된 바와 같이, 편집인터페이스는 L1레이어 해당하는 프리뷰레이어(L1) 및 L2레이어에 해당하는 스크립트레이어(L2)를 포함할 수 있다. 구체적으로 프리뷰레이어(L1)에는 AI휴먼영상에 대한 프리뷰를 제공할 수 있고, 스크립트레이어(L2)는 AI휴먼영상을 생성하는 복수의 서브텍스트 각각에 대한 정보를 표시하면서, 편집할 수 있는 기능을 제공할 수 있다.As illustrated in Fig. 7, the editing interface may include a preview layer (L1) corresponding to the L1 layer and a script layer (L2) corresponding to the L2 layer. Specifically, the preview layer (L1) may provide a preview of the AI human image, and the script layer (L2) may provide an editing function while displaying information on each of a plurality of subtexts that generate the AI human image.

또한, AI휴먼영상에는 타이틀, 서브타이틀, 및 생성이미지가 포함될 수 있고, 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지의 레이아웃은 상기 슬라이드자동생성단계에서 복수의 레이아웃템플릿에 대한 사용자단말의 선택에 기초하여 설정될 수 있다. 상기 타이틀, 상기 서브타이틀, 및 상기 생성이미지는 상기 데이터자동생성단계를 통해 생성된 타이틀, 복수의 서브타이틀, 및 복수의 생성이미지에 포함되고, 바람직하게는, 하나의 슬라이드에는 하나의 타이틀, 하나의 서브타이틀, 및 하나의 생성이미지가 표시될 수 있다. 또한, AI휴먼영상에는 자막이 더 표시될 수 있고, 상기 타이틀, 상기 서브타이틀, 상기 생성이미지, 및 상기 자막의 레이아웃은 상기 슬라이드자동생성단계에서 복수의 레이아웃템플릿에 대한 사용자단말의 선택에 기초하여 설정될 수 있다. 바람직하게는, 상기 데이터자동생성단계에서 추출된 서브텍스트 각각은 AI휴먼영상 내에서 자막 형식으로 표시될 수 있고, 도 7에 도시된 서브텍스트레이어(L3)에 표시된 서브텍스트는 AI휴먼영상에서 AI휴먼이 발화하는 스크립트에 해당할 수 있고, 동시에 AI휴먼영상의 하단에 표시되는 자막에 해당할 수 있다.In addition, the AI human image may include a title, a subtitle, and a generated image, and the layout of the title, the subtitle, and the generated image may be set based on a selection of a user terminal for a plurality of layout templates in the slide automatic generation step. The title, the subtitle, and the generated image are included in the title, the plurality of subtitles, and the plurality of generated images generated through the data automatic generation step, and preferably, one title, one subtitle, and one generated image may be displayed in one slide. In addition, the AI human image may further display a subtitle, and the layout of the title, the subtitle, the generated image, and the subtitle may be set based on a selection of a user terminal for a plurality of layout templates in the slide automatic generation step. Preferably, each subtext extracted in the above data automatic generation step can be displayed in the form of a subtitle within the AI human image, and the subtext displayed in the subtext layer (L3) illustrated in FIG. 7 can correspond to a script spoken by the AI human in the AI human image, and at the same time, can correspond to a subtitle displayed at the bottom of the AI human image.

도 8의 L2레이어는 스크립트레이어(L2)를 도시한다. 도시된 바와 같이 스크립트레이어(L2)는 텍스트가 분할된 복수의 서브텍스트 각각을 표시하면서, 해당 서브텍스트에 대한 편집기능을 제공하는 복수의 서브텍스트레이어(L3)를 포함할 수 있다.The L2 layer of Fig. 8 illustrates a script layer (L2). As illustrated, the script layer (L2) may include a plurality of subtext layers (L3) that display each of a plurality of subtexts into which text is divided, while providing editing functions for the corresponding subtexts.

서브텍스트레이어(L3)는 해당 서브텍스트에 대한 서브텍스트를 표시하는 서브텍스트표시레이어(L4)를 포함할 수 있고, 사용자는 서브텍스트표시레이어(L4)에 표시되는 서브텍스트를 수정함으로써, 해당 서브텍스트에 대한 AI휴먼영상을 수정할 수 있다. 예를 들어, 서브텍스트는 AI휴먼영상에서 AI휴먼캐릭터가 발화하는 스크립트에 해당할 수 있고, 사용자는 서브텍스트를 수정함으로써, AI휴먼캐릭터가 발화하는 스크립트의 내용을 수정할 수 있다.The subtext layer (L3) may include a subtext display layer (L4) that displays subtext for the corresponding subtext, and the user may modify the AI human image for the corresponding subtext by modifying the subtext displayed in the subtext display layer (L4). For example, the subtext may correspond to a script spoken by an AI human character in the AI human image, and the user may modify the content of the script spoken by the AI human character by modifying the subtext.

한편, 사용자단말에 의해 서브텍스트에 대한 수정이 이루어지는 경우에, 서비스서버(1)는 수정된 서브텍스트를 TTS로 변환했을 때의 음성을 재생성하고, 해당 음성에 기반하여 AI휴먼영상을 재생성하여 표시할 수 있다.Meanwhile, when a subtext is modified by a user terminal, the service server (1) can regenerate the voice of the modified subtext converted to TTS and regenerate and display an AI human image based on the voice.

또한 서비스서버(1)는 재생성된 음성의 재생시간에 대한 정보를 수정된 서브텍스트에 대한 서브텍스트레이어(L3)에 표시할 수 있다.Additionally, the service server (1) can display information about the playback time of the regenerated voice in the subtext layer (L3) for the modified subtext.

본 발명의 일 실시예에서, 사용자에 의해 수정된 서브텍스트에 대한 음성의 재생시간이 기설정된 시간간격을 초과하는 경우에, 서비스서버(1)는 수정된 서브텍스트는 복수 개로 분할하여, 분할된 서브텍스트 각각에 대한 AI휴먼영상을 재생성할 수 있다.In one embodiment of the present invention, when the voice playback time for a subtext modified by a user exceeds a preset time interval, the service server (1) can divide the modified subtext into multiple pieces and regenerate an AI human image for each of the divided subtexts.

한편, 복수의 서브텍스트 각각에 대한 서브텍스트레이어(L3)는 해당 서브텍스트에 대한 음성을 재생할 수 있는 E2엘리먼트를 더 포함할 수 있다. 구체적으로 사용자는 E2엘리먼트를 선택입력함으로써, 서브텍스트에 대한 음성을 청취해볼 수 있다.Meanwhile, the subtext layer (L3) for each of the plurality of subtexts may further include an E2 element capable of playing a voice for the corresponding subtext. Specifically, the user can listen to a voice for the subtext by selecting and inputting the E2 element.

전술한 바와 같이, 서브텍스트는 대상문서파일에서 추출된 텍스트를 TTS로 변환했을 때의 음성의 재생시간을 기설정된 시간간격으로 분할하여 생성될 수 있고, E2엘리먼트는 이와 같이 생성된 서브텍스트를 TTS로 변환했을 때의 음성의 재생시간을 표시하는 엘리먼트일 수 있다.As described above, the subtext can be generated by dividing the playback time of the voice when the text extracted from the target document file is converted into TTS into preset time intervals, and the E2 element can be an element that indicates the playback time of the voice when the subtext generated in this manner is converted into TTS.

달리 말하자면, E2엘리먼트가 표시하는 음성의 재생시간에 대한 정보는 해당 서브텍스트에 기초하여 AI휴먼영상을 생성했을 때, 해당 AI휴먼영상의 재생시간과 상응할 수 잇다.In other words, the information about the playback time of the voice displayed by the E2 element can correspond to the playback time of the AI human video when the AI human video is generated based on the subtext.

이와 같이, 사용자는 복수의 서브텍스트 각각에 대한 AI휴먼영상의 재생시간을 직관적이고 용이하게 확인할 수 있다.In this way, users can intuitively and easily check the playback time of AI human images for each of multiple subtexts.

본 발명의 일 실시예에서, 텍스트를 서브텍스트로 분할하기 위한 기설정된 시간간격은 서비스서버(1)가 생성할 수 있는 하나의 AI휴먼영상에 대한 최대 재생시간에 해당할 수 있다. 즉 서비스서버(1)가 텍스트를 TTS로 음성으로 변환했을 때 음성의 재생시간을 1분 간격으로 분할하여 복수의 서브텍스트를 생성하는 경우에, 서비스서버(1)는 최대 1분 길이의 AI휴먼영상을 생성할 수 있다.In one embodiment of the present invention, the preset time interval for dividing text into subtexts may correspond to the maximum playback time for one AI human image that the service server (1) can generate. That is, when the service server (1) converts text into voice using TTS and divides the playback time of the voice into 1-minute intervals to generate multiple subtexts, the service server (1) can generate an AI human image with a maximum length of 1 minute.

서비스서버(1)는 사용자단말에 서브텍스트에 대한 음성의 재생시간을 표시함으로써, 사용자가 하나의 서브텍스트가 기설정된 시간간격(1분)을 초과하지 않게 할 수 있다.The service server (1) can display the voice playback time for a subtext on the user terminal, thereby preventing the user from allowing a single subtext to exceed a preset time interval (1 minute).

한편 스크립트레이어(L2)는 해당 스크립트에 포함되는 복수의 서브텍스트레이어(L3) 각각에 대한 서브텍스트에 기초하여, 생성된 AI휴먼영상에 대한 음성을 청취할 수 있는 L5레이어를 더 포함할 수 있다.Meanwhile, the script layer (L2) may further include an L5 layer capable of listening to the voice of the generated AI human image based on subtext for each of a plurality of subtext layers (L3) included in the script.

전술한 바와 같이 대상문서파일에서 추출된 텍스트를 음성으로 변환했을 때 기설정된 간격 이하의 재생시간을 가지는 복수의 서브텍스트 각각에 대한 AI휴먼영상이 생성될 수 있고, 각각의 서브텍스트에 대한 음성의 청취는 해당 서브텍스트에 대한 서브텍스트레이어(L3)에 포함되는 E2엘리먼트를 통해 청취할 수 있다.As described above, when text extracted from a target document file is converted into voice, an AI human image can be generated for each of a plurality of subtexts having a playback time less than a preset interval, and the voice for each subtext can be heard through the E2 element included in the subtext layer (L3) for the corresponding subtext.

서비스서버(1)는 상술한 E2엘리먼트를 통해 구현되는 복수의 서브텍스트 각각에 대한 음성을 청취할 수 있는 기능과 더불어, 복수의 서브텍스트 각각에 대한 음성을 병합했을 때의 전체 음성을 청취할 수 있는 기능을 L5레이어를 통해 구현할 수 있다.The service server (1) can implement a function of listening to the voice for each of the multiple subtexts implemented through the E2 element described above, as well as a function of listening to the entire voice when the voices for each of the multiple subtexts are merged, through the L5 layer.

도 9의 (a)는 AI휴먼영상에서의 AI휴먼캐릭터의 수 및 모델을 선택할 수 있는 AI휴먼캐릭터선택레이어(L6)를 도시한다. 도시된 바와 같이 AI휴먼캐릭터선택레이어(L6)에서 사용자는 AI휴먼영상에서 적용되는 AI휴먼캐릭터의 숫자와 모델 유형을 선택할 수 있다.Fig. 9 (a) illustrates an AI human character selection layer (L6) that can select the number and model of AI human characters in an AI human image. As illustrated, in the AI human character selection layer (L6), a user can select the number and model type of AI human characters applied in an AI human image.

도 9의 (b)는 AI휴먼영상에서의 AI휴먼캐릭터에 대한 스타일, 앵글, 포즈, 크기, 및 위치를 설정할수 있는 AI휴먼캐릭터설정레이어(L7)을 도시한다. 도시된 바와 같이 AI휴먼캐릭터설정레이어(L7)는 AI휴먼캐릭터선택레이어(L6)에서 선택된AI휴먼캐릭터에 대한 숨김(가리기)여부, 스타일, 앵글, 포즈, 크기, 및 위치 중 1 이상에 대한 정보를 설정하기 위해 제공되는 레이어일 수 있다.Fig. 9 (b) illustrates an AI human character setting layer (L7) that can set the style, angle, pose, size, and position of an AI human character in an AI human image. As illustrated, the AI human character setting layer (L7) may be a layer provided to set information on one or more of whether to hide (cover), style, angle, pose, size, and position of an AI human character selected in an AI human character selection layer (L6).

AI휴먼캐릭터설정레이어(L7)는 해당 서브텍스트에 대한 AI휴먼영상에 대한 AI휴먼캐릭터를 변경할 수 있는 E3엘리먼트를 더 포함하고, 상기 E3엘리먼트가 선택입력되는 경우에AI휴먼캐릭터선택레이어(L6)가 사용자단말에 표시될 수 있다.The AI human character setting layer (L7) further includes an E3 element that can change the AI human character for the AI human image for the corresponding subtext, and when the E3 element is selected and input, the AI human character selection layer (L6) can be displayed on the user terminal.

AI휴먼캐릭터설정레이어(L7)는 프리뷰재생레이어(L8)에서 재생되는 AI휴먼영상에 표시되는 AI휴먼캐릭터를 선택입력하거나, 서브텍스트에 상응하는 요약블록을 선택입력하는 경우에 표시될 수 있다.The AI human character setting layer (L7) can be displayed when selecting and inputting an AI human character displayed in an AI human video played in the preview playback layer (L8) or when selecting and inputting a summary block corresponding to a subtext.

구체적으로 전술한 바와 같이, AI휴먼영상은 AI휴먼캐릭터가 서브텍스트를 발화하는 영상으로서 프리뷰재생레이어(L8)에서 재생될 수 있다. 이와 같이 프리뷰재생레이어(L8)에서 재생되는 AI휴먼영상에 표시되는 AI휴먼캐릭터를 선택입력하는 경우에, 사용자단말에는 AI휴먼캐릭터설정레이어(L7)가 표시될 수 있고, 사용자는 해당 AI휴먼캐릭터에 대한 세부적인 정보를 설정 및 변경할 수 있다.As specifically described above, the AI human video can be played in the preview playback layer (L8) as a video in which the AI human character utters a subtext. In this way, when selecting and inputting an AI human character displayed in the AI human video played in the preview playback layer (L8), an AI human character setting layer (L7) can be displayed on the user terminal, and the user can set and change detailed information about the corresponding AI human character.

또한 AI휴먼영상은 복수의 서브텍스트 각각에 대해 생성되고 복수의 서브텍스트 각각에 상응하는 요약블록이 표시될 수 있고, 서브텍스트에 대한 요약블록에서의 선택입력에 따라 AI휴먼캐릭터설정레이어(L7)가 표시될 수 있다.In addition, AI human images can be generated for each of a plurality of subtexts, and a summary block corresponding to each of a plurality of subtexts can be displayed, and an AI human character setting layer (L7) can be displayed according to a selection input in the summary block for the subtext.

예를 들어, 사용자가 요약블록에 표시되는 AI휴먼캐릭터의 명칭을 선택입력하는 경우에, 사용자단말에는 해당 AI휴먼캐릭터에 대한 AI휴먼캐릭터설정레이어(L7)가 표시될 수 있고, 사용자는 해당 AI휴먼캐릭터에 대한 세부적인 정보를 설정 및 변경할 수 있다.For example, when a user selects and inputs the name of an AI human character displayed in a summary block, an AI human character setting layer (L7) for the corresponding AI human character may be displayed on the user terminal, and the user may set and change detailed information about the corresponding AI human character.

도 10은 AI휴먼영상에 대한 프리뷰를 제공하는 프리뷰레이어(L1)를 도시한다. 구체적으로 프리뷰레이어(L1)는 AI휴먼영상의 재생, 정지를 포함하는 재생동작과 관련된 아이콘이 디스플레이되는 프리뷰재생레이어(L8)를 포함할 수 있다.Fig. 10 illustrates a preview layer (L1) that provides a preview of an AI human image. Specifically, the preview layer (L1) may include a preview playback layer (L8) that displays icons related to playback operations, including playback and stopping of the AI human image.

전술한 바와 같이, 사용자단말은 서비스서버(1)로부터 대상문서파일을 업로드하고, AI휴먼캐릭터에 대한 세부적인 설정정보를 입력할수 있는 인터페이스를 제공받을 수 있고, 서비스서버(1)는 사용자단말이 입력한 정보에 기초하여, 대상문서파일의 텍스트에 대한 AI휴먼영상을 생성할수 있다.As described above, the user terminal can be provided with an interface for uploading a target document file from the service server (1) and inputting detailed setting information for the AI human character, and the service server (1) can generate an AI human image for the text of the target document file based on the information input by the user terminal.

프리뷰재생레이어(L8)는 이와 같이 생성된 AI휴먼영상이 표시되는 레이어일 수 있다. 구체적으로 프리뷰재생레이어(L8)는 AI휴먼영상의 재생, 정지와 같은 재생동작을 입력할 수 있는 아이콘을 포함하는 E4엘리먼트를 포함할 수 있다.The preview playback layer (L8) may be a layer in which the AI human image generated in this manner is displayed. Specifically, the preview playback layer (L8) may include an E4 element including an icon that can input playback actions such as play and stop of the AI human image.

프리뷰레이어(L1)는 프리뷰재생레이어(L8)에서 표시되는 AI휴먼영상에 대한 정보를 시계열적으로 요약하여 디스플레이하는 시계열요약레이어(L9)를 포함할 수 있다. 구체적으로 시계열요약레이어(L9)는 프리뷰재생레이어(L8)에 표시되는 AI휴먼영상의 시점을 변경하기 위한 타임라인레이어(L10)를 포함할 수 있다. 바람직하게는 타임라인레이어(L10)는 시간축으로 표시될 수 있다.The preview layer (L1) may include a time series summary layer (L9) that displays information about the AI human image displayed in the preview playback layer (L8) in a time series summary manner. Specifically, the time series summary layer (L9) may include a timeline layer (L10) for changing the viewpoint of the AI human image displayed in the preview playback layer (L8). Preferably, the timeline layer (L10) may be displayed as a time axis.

도시된 바와 같이, 사용자는 타임라인레이어(L10)에 표시되는 조작축엘리먼트(E5)를 좌우로 이동해가면서, 프리뷰재생레이어(L8)에 표시되는 AI휴먼영상의 시점을 변경할 수 있다. 예를 들어, 조작축엘리먼트(E5)가 우측으로 이동하는 경우에 AI휴먼영상의 재생시점이 뒤로 이동할 수 있다.As illustrated, the user can change the viewpoint of the AI human image displayed in the preview playback layer (L8) by moving the manipulation axis element (E5) displayed in the timeline layer (L10) left and right. For example, when the manipulation axis element (E5) moves to the right, the playback viewpoint of the AI human image can move backwards.

프리뷰레이어(L1)의 시계열요약레이어(L9)는 시계열에 따라 해당 시점에서의 AI휴먼영상에 대한 정보를 표시하는 스크립트요약정보레이어(L11)를 포함할 수 있다. 구체적으로 스크립트요약정보레이어(L11)는 각각의 서브텍스트에 상응하는 각각의 요약블록(B1)을 포함할 수 있다.The time series summary layer (L9) of the preview layer (L1) may include a script summary information layer (L11) that displays information about the AI human image at a given point in time according to the time series. Specifically, the script summary information layer (L11) may include each summary block (B1) corresponding to each subtext.

바람직하게는 각각의 요약블록(B1)은, 해당 서브텍스트에 대한 AI휴먼캐릭터, 및 스크립트에 대한 정보를 포함할 수 있다.Preferably, each summary block (B1) may include information about the AI human character and script for the corresponding subtext.

전술한 바와 같이, 서비스서버(1)는 대상문서파일에서 추출된 텍스트를 복수의 서브텍스트로 분할하고, 분할된 복수의 서브텍스트 각각에 대한 AI휴먼영상을 생성할 수 있다. 요약블록은 이와 같이 생성된 복수의 서브텍스트 각각에 대한 AI휴먼영상에 대한 정보를 표시할 수 있다.As described above, the service server (1) can divide the text extracted from the target document file into multiple subtexts and generate an AI human image for each of the multiple divided subtexts. The summary block can display information about the AI human image for each of the multiple subtexts generated in this manner.

구체적으로 요약블록은 해당 서브텍스트에 대한 AI휴먼영상에서의 AI휴먼캐릭터의 명칭(모델명), 해당 서브텍스트에 대한 스크립트, 해당 서브텍스트를 TTS로 변환했을 때의 음성의 재생시간에 대한 정보를 포함할 수 있다.Specifically, the summary block may include information about the name (model name) of the AI human character in the AI human video for the corresponding subtext, a script for the corresponding subtext, and the playback time of the voice when the corresponding subtext is converted to TTS.

예를 들어 도 10에서, 사용자는 첫번째 요약블록(B1)을 통해 00:00부터 00:17초까지 재생되는 AI휴먼영상에 대한 정보를 확인할 수 있고, 두번째 요약블록(B1)을 통해 00:17부터 00:36초까지 재생되는 AI휴먼영상에 대한 정보를 확인할 수 있다.For example, in Fig. 10, a user can check information about an AI human video played from 00:00 to 00:17 seconds through the first summary block (B1), and can check information about an AI human video played from 00:17 to 00:36 seconds through the second summary block (B1).

한편, 요약블록(B1)에 표시된 AI휴먼캐릭터의 명칭을 선택입력하는 경우에, 해당 AI휴먼캐릭터에 대한 AI휴먼캐릭터설정레이어(L7)가 사용자단말에 표시될 수 있다.Meanwhile, when selecting and entering the name of the AI human character displayed in the summary block (B1), the AI human character setting layer (L7) for the corresponding AI human character may be displayed on the user terminal.

본 발명의 일 실시예에서, 복수의 요약블록 각각의 형태는 서브텍스트를 TTS로 음성으로 변환되는 경우, 음성의 재생시간에 대한 정보가 반영되면서 결정될 수 있다.In one embodiment of the present invention, the form of each of a plurality of summary blocks can be determined by reflecting information about the playback time of the voice when the subtext is converted into voice as TTS.

예를 들어, 복수의 요약블록 각각의 크기는 서브텍스트를 TTS로 음성으로 변환되는 경우, 음성의 재생시간에 비례하여 결정될 수 있다.For example, the size of each of the multiple summary blocks may be determined in proportion to the playback time of the voice when the subtext is converted into speech via TTS.

바람직하게는 타임라인레이어(L10)는 가로로 표시되는 시간축일 수 있고, 복수의 요약블록 각각의 가로길이는 서브텍스트를 TTS로 음성으로 변환되는 경우, 음성의 재생시간에 비례하여 결정됨으로써, 사용자가 요약블록 각각에 대한 가로길이를 통해 요약블록 각각에 상응하는 서브텍스트에 대한 AI휴먼영상의 재생길이를 직관적으로 인지하게 할 수 있다.Preferably, the timeline layer (L10) may be a time axis displayed horizontally, and the horizontal length of each of the plurality of summary blocks is determined in proportion to the playback time of the voice when the subtext is converted into voice as TTS, so that the user can intuitively recognize the playback length of the AI human video for the subtext corresponding to each summary block through the horizontal length of each summary block.

또한, 상기 프리뷰재생레이어(L8)는 AI휴먼영상이 표시되는레이어로써, 상기 프리뷰재생레이어(L8)에는 타이틀, 서브타이틀, 생성이미지, 및 AI휴먼이 포함될 수 있다. 상기 편집인터페이스를 통한 사용자단말의 입력에 따라 상기 타이틀, 상기 서브타이틀, 상기 생성이미지, 및 상기 AI휴먼의 위치 및 크기가 조정될 수 있고, 타이틀, 서브타이틀, 및 생성이미지 중 1 이상이 해당 슬라이드 및 AI휴먼영상에서 삭제될 수도 있다. 또한, 상기 프리뷰재생레이어(L8) 및 AI휴먼영상에는 자막이 더 표시될 수 있고, 바람직하게는, 상기 데이터자동생성단계에서 추출된 복수의 서브텍스트가 AI휴먼영상에서 자막 형식으로 표시될 수 있다. 본 발명의 일 실시예에서는, 상기 자막은 상기 편집인터페이스를 통한 사용자단말의 입력에 따라 자막의 삽입 및 삭제가 가능하며, 자막을 삽입한 경우에는 해당 자막의 위치 및 크기를 조정할 수 있다.In addition, the preview playback layer (L8) is a layer in which the AI human image is displayed, and the preview playback layer (L8) may include a title, a subtitle, a generated image, and an AI human. The positions and sizes of the title, the subtitle, the generated image, and the AI human may be adjusted according to an input of the user terminal through the editing interface, and at least one of the title, the subtitle, and the generated image may be deleted from the corresponding slide and AI human image. In addition, subtitles may be further displayed on the preview playback layer (L8) and the AI human image, and preferably, a plurality of subtexts extracted in the data automatic generation step may be displayed in the form of subtitles on the AI human image. In one embodiment of the present invention, the subtitles may be inserted and deleted according to an input of the user terminal through the editing interface, and when the subtitles are inserted, the positions and sizes of the corresponding subtitles may be adjusted.

한편, 전술한 바와 같이 사용자는 복수의 서브텍스트 각각에 대해 생성된 복수의 AI휴먼영상 각각에서의 복수의 AI휴먼캐릭터 각각에 대한 정보를 설정할 수 있다. 예를 들어, 도 11에서 사용자는 복수의 서브텍스트 각각에 대한 AI휴먼영상에서의 AI휴먼캐릭터의 모델을 달리 설정함으로써, 2 이상의 AI휴먼캐릭터가 대화하는 AI휴먼영상을 생성할 수 있다.Meanwhile, as described above, the user can set information for each of the plurality of AI human characters in each of the plurality of AI human images generated for each of the plurality of subtexts. For example, in Fig. 11, the user can create an AI human image in which two or more AI human characters converse by setting different models of the AI human characters in the AI human images for each of the plurality of subtexts.

이와 같은 경우에, 프리뷰재생레이어(L8)에는 복수의 AI휴먼캐릭터 C1, C2를 포함하는 AI휴먼영상이 표시될 수 있다.In such a case, the preview playback layer (L8) may display an AI human image including multiple AI human characters C1 and C2.

또한, 스크립트요약정보레이어(L11)에는 복수의 AI휴먼캐릭터 C1, C2 각각에 대한 요약블록 B1, B2가 표시될 수 있다. 구체적으로 요약블록 B1은 AI휴먼캐릭터 C1에 대한 정보를 표시하고, 요약블록 B2는 AI휴먼캐릭터 C2에 대한 정보를 표시할 수 있다.In addition, the script summary information layer (L11) may display summary blocks B1 and B2 for each of multiple AI human characters C1 and C2. Specifically, summary block B1 may display information about AI human character C1, and summary block B2 may display information about AI human character C2.

본 발명의 일 실시예에서, 복수의 서브텍스트 각각에대한 AI휴먼영상에서, 서로 동일한 AI휴먼캐릭터에 대한 서브텍스트에 대한 요약블록의 색상을동일하게 설정하고, 상이하게 설정된 AI휴먼캐릭터에 대한 서브텍스트에 대한 요약블록의 색상은 서로 상이하게 설정될 수 있다.In one embodiment of the present invention, in AI human images for each of a plurality of subtexts, the colors of summary blocks for subtexts for the same AI human character may be set to be the same, and the colors of summary blocks for subtexts for different AI human characters may be set to be different from each other.

예를 들어, 도 11에서 서로 다른 AI휴먼캐릭터 C1, C2 각각에 대한 요약블록 B1, B2의 색상이 상이하게 표시될 수 있다. 이와 같이 서비스서버(1)는 서로 다른 AI휴먼캐릭터 C1, C2 각각에 대한 요약블록 B1, B2의 색상을 상이하게 표시함으로써, 사용자가 각각의 요약블록이 어떤 AI휴먼캐릭터에 대한 정보를 내포하는 지를 직관적으로 인지하게 할 수 있다.For example, in Fig. 11, the colors of summary blocks B1 and B2 for different AI human characters C1 and C2 may be displayed differently. In this way, the service server (1) can display the colors of summary blocks B1 and B2 for different AI human characters C1 and C2 differently, thereby allowing the user to intuitively recognize which AI human character each summary block contains information about.

한편 전술한 바와 같이, 사용자에 의하여 프리뷰재생레이어(L8)에서 재생되는 AI휴먼영상에 표시되는 AI휴먼캐릭터가 선택입력되거나, 스크립트요약정보레이어(L11)에서 해당 AI휴먼캐릭터에 대한 요약블록이 선택입력되는 경우, 해당 AI휴먼캐릭터에 대한 AI휴먼캐릭터설정레이어(L7)가 표시될 수 있다.Meanwhile, as described above, when an AI human character displayed in an AI human video played in the preview playback layer (L8) is selected and input by the user, or a summary block for the corresponding AI human character is selected and input in the script summary information layer (L11), an AI human character setting layer (L7) for the corresponding AI human character may be displayed.

예를 들어, 사용자는 프리뷰재생레이어(L8)에 표시되는 AI휴먼캐릭터 C1을 선택하거나, 스크립트요약정보레이어(L11)에서 AI휴먼캐릭터 C1에 대한 요약블록 B1(바람직하게는 요약블록 B1에 표시되는 AI휴먼캐릭터 C1의 명칭)을 선택입력함으로써, AI휴먼캐릭터 C1에 대한 세부적인 정보를 설정할 수 있다.For example, a user can set detailed information about the AI human character C1 by selecting the AI human character C1 displayed in the preview playback layer (L8) or selecting and entering the summary block B1 (preferably the name of the AI human character C1 displayed in the summary block B1) for the AI human character C1 in the script summary information layer (L11).

도 12에 도시된 바와 같이, 사용자는 AI휴먼영상에서 ai휴먼캐릭터의 제스처를 설정할 수 있다.As shown in Figure 12, a user can set the gestures of an AI human character in an AI human image.

전술한 바와 같이 스크립트레이어(L2)는 복수의 서브텍스트 각각에 대한 서브텍스트레이어(L3)를 포함할 수 있고, 서비스서버(1)는 복수의 서브텍스트 각각에 대한 AI휴먼영상을 생성할 수 있다.As described above, the script layer (L2) can include a subtext layer (L3) for each of the plurality of subtexts, and the service server (1) can generate an AI human image for each of the plurality of subtexts.

한편, 도 12의 (a)에 도시된 바와 같이, 복수의 서브텍스트레이어(L3) 각각은 해당 서브텍스트에 대한 AI휴먼영상에서의 AI휴먼캐릭터에 대한 제스처를 설정할 수 있는 제스처입력레이어(L12)를 표시하기 위한 E6엘리먼트를 포함할 수 있다.Meanwhile, as illustrated in (a) of FIG. 12, each of the plurality of subtext layers (L3) may include an E6 element for displaying a gesture input layer (L12) that can set a gesture for an AI human character in an AI human image for the corresponding subtext.

사용자에 의하여 E6엘리먼트가 선택입력되는 경우에, 제스처입력레이어(L12)가 표시될 수 있고, 사용자는 해당 서브텍스트에 대한 AI휴먼캐릭터에 대한 제스처 종류, 및 발현위치를 설정할 수 있다.When the E6 element is selected and input by the user, the gesture input layer (L12) can be displayed, and the user can set the gesture type and expression location for the AI human character for the corresponding subtext.

구체적으로 제스처종류란 AI휴먼캐릭터가 어떤 손을 사용할 지, 혹은 어떤 행동을 표정을 지을 지 등에 대한 정보를 포함할 수 있고, 제스처의 발현위치란 해당 서브텍스트에 대한 AI휴먼영상의 어느 시점에서 AI휴먼캐릭터가 해당 제스처를 취할 지에 대한 정보를 의미할 수 있다.Specifically, the type of gesture may include information about which hand the AI human character will use, or what kind of action and facial expression it will make, and the location of gesture expression may refer to information about at what point in the AI human video for the corresponding subtext the AI human character will make the corresponding gesture.

예를 들어, 사용자는 서브텍스트에 대한 "문장 앞"에서 AI휴먼캐릭터가 “오른손”을 사용하도록 설정할 수 있다.For example, a user can set the AI human character to use the “right hand” in the “before sentence” for subtext.

이와 같이, AI휴먼영상에서 AI휴먼캐릭터가 사용자에 의해 설정된 제스처를 구현함으로써, 보다 사실적이고 사용자 의도에 부합하는 AI영상이 구현될 수 있다.In this way, by having the AI human character implement the gestures set by the user in the AI human video, a more realistic AI video that matches the user's intention can be implemented.

한편, 도 12의 (b)에 도시된 바와 같이, 제스처입력레이어(L12)에서 AI휴먼캐릭터에 대한 제스처가 설정되는 경우에, 해당 서브텍스트에 대한 요약블록(B1)에 해당 제스처에 대한 정보가 표시될 수 있다.Meanwhile, as shown in (b) of Fig. 12, when a gesture is set for an AI human character in the gesture input layer (L12), information about the corresponding gesture can be displayed in the summary block (B1) for the corresponding subtext.

구체적으로 해당 제스처에 대해 설정된 발현위치에 상응하는 요약블록(B1)의 세부위치에 해당 제스처에 대한 정보를 표시하는 E7엘리먼트가 오버레이되어 표시될 수 있다.Specifically, an E7 element that displays information about the gesture may be overlaid and displayed at a detailed location of a summary block (B1) corresponding to the expression location set for the gesture.

예를 들어, 도 12의 (a)에서 특정 서브텍스트에 대한 제스처의 발현위치가 "문장 앞"으로 설정되고, 제스처의 종류가 "오른손"으로 설정되는 경우에, 해당 서브텍스트에 대한 요약블록에는, E7엘리먼트가 "문장 앞"에 상응하는 요약블록의 세부위치(앞 혹은 좌측)에서 해당 제스처의 종류 "오른손"에 대한 정보가 표시될 수 있다.For example, in (a) of Fig. 12, if the expression location of a gesture for a specific subtext is set to “before a sentence” and the type of gesture is set to “right hand,” information about the type of gesture “right hand” can be displayed in the summary block for the subtext at the detailed location (front or left) of the summary block corresponding to “before a sentence” in which the E7 element is located.

이와 같이 사용자가 제스처입력레이어(L12)에서 설정한 제스처에 대한 정보가, 해당 서브텍스트에 대한 요약블록(B1)에 상응하여 표시됨으로써, 사용자는 자신이 어떤 서브텍스트에 대해서, 어떤 제스처를 어떤 발현위치에 설정하였는 지를 직관적으로 인지할 수 있다.In this way, information about the gesture set by the user in the gesture input layer (L12) is displayed corresponding to the summary block (B1) for the corresponding subtext, so that the user can intuitively recognize which gesture was set for which subtext and at which expression location.

한편, 본 발명의 일 실시예에서 사용자는 요약블록(B1)의 E7엘리먼트를 삭제함으로써, 해당 서브텍스트에 대해 설정된 제스처에 대한 정보를 반영하지 않고 삭제할 수 있다.Meanwhile, in one embodiment of the present invention, a user can delete the E7 element of the summary block (B1) without reflecting information about the gesture set for the corresponding subtext.

이와 같이 사용자는 제스처를 삭제하고자 하는 경우에, E6엘리먼트를 선택입력하여 제스처입력레이어(L12)를 다시 불러올 필요없이, 해당 서브텍스트에 대한 요약블록(B1)에서 제스처를 바로 삭제할 수 있다.In this way, if a user wants to delete a gesture, he or she can delete the gesture directly from the summary block (B1) for the corresponding subtext without having to reload the gesture input layer (L12) by selecting and inputting the E6 element.

도 13에 도시된 바와 같이, 서비스서버(1)는 사용자단말에 AI휴먼영상의 배경이미지, 배경동영상, 배경음원 중 1 이상을 포함하는 배경정보를 설정할 수 있는 배경설정레이어를 제공할 수 있다.As illustrated in Fig. 13, the service server (1) can provide a background setting layer that can set background information including at least one of a background image, background video, and background sound source of an AI human video to a user terminal.

구체적으로 도 13의 (a)는 AI휴먼영상에서의 배경이미지를 설정하기 위해 사용자단말에 표시되는 레이어이고, 도 13의 (b)는 AI휴먼영상에서의 배경동영상을 설정하기 위해 사용자단말에 표시되는 레이어이고, 도 13의 (c)는 AI휴먼영상에서의 배경음원을 설정하기 위해 사용자단말에 표시되는 레이어이다.Specifically, (a) of Fig. 13 is a layer displayed on a user terminal for setting a background image in an AI human image, (b) of Fig. 13 is a layer displayed on a user terminal for setting a background video in an AI human image, and (c) of Fig. 13 is a layer displayed on a user terminal for setting a background sound source in an AI human image.

이와 같이, 사용자는 AI휴먼영상에서 백그라운드로 배경이미지, 배경동영상, 배경음원 중 1 이상을 삽입할 수 있다.In this way, users can insert one or more of a background image, background video, or background sound source into the AI human video.

도 14에 도시된 바와 같이, 배경설정레이어에서 AI휴먼영상에 대한 배경정보가 설정되는 경우에, 프리뷰재생레이어(L8)에는 해당 배경정보가 반영된 AI휴먼영상이 표시될 수 있다.As illustrated in Fig. 14, when background information for an AI human image is set in the background setting layer, an AI human image with the corresponding background information reflected can be displayed in the preview playback layer (L8).

또한 사용자는 프리뷰재생레이어(L8)에 표시되는 AI휴먼영상에서 반영된 배경정보를 수정할 수 있다. 예를 들어, 사용자는 프리뷰재생레이어(L8)에 표시되는 AI휴먼영상에서 오버레이되어 표시되는 배경이미지를 선택입력하여, 해당 배경이미지를 복제, 삭제하거나위치 등을 설정할 수 있다.In addition, the user can modify the background information reflected in the AI human image displayed in the preview playback layer (L8). For example, the user can select and input a background image that is overlaid on the AI human image displayed in the preview playback layer (L8) and duplicate, delete, or set the location, etc. of the background image.

도 15는 본 발명의 일 실시예에 따른 컴퓨팅장치(11000)의 내부 구성을 예시적으로 도시한다.FIG. 15 exemplarily illustrates the internal configuration of a computing device (11000) according to one embodiment of the present invention.

도 1에 대한 설명에서 언급된 서비스서버(1)는 후술하는 도 15에 도시된 컴퓨팅장치(11000)의 구성요소를 포함할 수 있다.The service server (1) mentioned in the description of Fig. 1 may include components of the computing device (11000) illustrated in Fig. 15 described below.

도 15에 도시한 바와 같이, 컴퓨팅장치(11000)은 적어도 하나의 프로세서(processor)(11100), 메모리(memory)(11200), 주변장치 인터페이스(peripheral interface)(11300), 입/출력 서브시스템(I/O subsystem)(11400), 전력 회로(11500) 및 통신 회로(11600)를 적어도 포함할 수 있다.As illustrated in FIG. 15, the computing device (11000) may include at least one processor (11100), memory (11200), peripheral interface (11300), input/output subsystem (I/O subsystem) (11400), power circuit (11500), and communication circuit (11600).

구체적으로, 상기 메모리(11200)는, 일례로 고속 랜덤 액세스 메모리(high-speed random access memory), 자기 디스크, 에스램(SRAM), 디램(DRAM), 롬(ROM), 플래시 메모리 또는 비휘발성 메모리를 포함할 수 있다. 상기 메모리(11200)는 상기 컴퓨팅장치(11000)의 동작에 필요한 소프트웨어 모듈, 명령어 집합 또는 그 밖에 다양한 데이터를 포함할 수 있다.Specifically, the memory (11200) may include, for example, a high-speed random access memory, a magnetic disk, an SRAM, a DRAM, a ROM, a flash memory, or a nonvolatile memory. The memory (11200) may include a software module, a set of instructions, or other various data required for the operation of the computing device (11000).

이때, 상기 프로세서(11100)나 상기 주변장치 인터페이스(11300) 등의 다른 컴포넌트에서 상기 메모리(11200)에 액세스하는 것은 상기 프로세서(11100)에 의해 제어될 수 있다. 상기 프로세서(11100)은 단일 혹은 복수로 구성될 수 있고, 연산처리속도 향상을 위하여 GPU 및 TPU 형태의 프로세서를 포함할 수 있다.At this time, access to the memory (11200) from other components such as the processor (11100) or the peripheral interface (11300) may be controlled by the processor (11100). The processor (11100) may be configured as a single or multiple processors, and may include processors in the form of GPUs and TPUs to improve the processing speed.

상기 주변장치 인터페이스(11300)는 상기 컴퓨팅장치(11000)의 입력 및/또는 출력 주변장치를 상기 프로세서(11100) 및 상기 메모리 (11200)에 결합시킬 수 있다. 상기 프로세서(11100)는 상기 메모리(11200)에 저장된 소프트웨어 모듈 또는 명령어 집합을 실행하여 상기 컴퓨팅장치(11000)을 위한 다양한 기능을 수행하고 데이터를 처리할 수 있다.The peripheral interface (11300) may couple input and/or output peripherals of the computing device (11000) to the processor (11100) and the memory (11200). The processor (11100) may execute software modules or instruction sets stored in the memory (11200) to perform various functions for the computing device (11000) and process data.

상기 입/출력 서브시스템(11400)은 다양한 입/출력 주변장치들을 상기 주변장치 인터페이스(11300)에 결합시킬 수 있다. 예를 들어, 상기 입/출력 서브시스템(11400)은 모니터나 키보드, 마우스, 프린터 또는 필요에 따라 터치스크린이나 센서 등의 주변장치를 상기 주변장치 인터페이스(11300)에 결합시키기 위한 컨트롤러를 포함할 수 있다. 다른 측면에 따르면, 상기 입/출력 주변장치들은 상기 입/출력 서브시스템(11400)을 거치지 않고 상기 주변장치 인터페이스(11300)에 결합될 수도 있다.The input/output subsystem (11400) can couple various input/output peripheral devices to the peripheral interface (11300). For example, the input/output subsystem (11400) can include a controller for coupling peripheral devices such as a monitor, a keyboard, a mouse, a printer, or a touch screen or a sensor as needed to the peripheral interface (11300). According to another aspect, the input/output peripheral devices can be coupled to the peripheral interface (11300) without going through the input/output subsystem (11400).

상기 전력 회로(11500)는 단말기의 컴포넌트의 전부 또는 일부로 전력을 공급할 수 있다. 예를 들어 상기 전력 회로(11500)는 전력 관리 시스템, 배터리나 교류(AC) 등과 같은 하나 이상의 전원, 충전 시스템, 전력 실패 감지 회로(power failure detection circuit), 전력 변환기나 인버터, 전력 상태 표시자 또는 전력 생성, 관리, 분배를 위한 임의의 다른 컴포넌트들을 포함할 수 있다.The power circuit (11500) may supply power to all or part of the components of the terminal. For example, the power circuit (11500) may include a power management system, one or more power sources such as a battery or alternating current (AC), a charging system, a power failure detection circuit, a power converter or inverter, a power status indicator, or any other components for power generation, management, or distribution.

상기 통신 회로(11600)는 적어도 하나의 외부 포트를 이용하여 다른 컴퓨팅장치와 통신을 가능하게 할 수 있다. 또는, 상술한 바와 같이 필요에 따라 상기 통신 회로(11600)는 RF 회로를 포함하여 전자기 신호(electromagnetic signal)라고도 알려진 RF 신호를 송수신함으로써, 다른 컴퓨팅장치와 통신을 가능하게 할 수도 있다.The above communication circuit (11600) may enable communication with another computing device by using at least one external port. Alternatively, as described above, the communication circuit (11600) may enable communication with another computing device by transmitting and receiving an RF signal, also known as an electromagnetic signal, including an RF circuit, if necessary.

이러한 도 15의 실시예는, 상기 컴퓨팅장치(11000)의 일례일 뿐이고, 상기 컴퓨팅장치(11000)는 도 15에 도시된 일부 컴포넌트가 생략되거나, 도 15에 도시되지 않은 추가의 컴포넌트를 더 구비하거나, 2 개 이상의 컴포넌트를 결합시키는 구성 또는 배치를 가질 수 있다. 예를 들어, 모바일 환경의 통신 단말을 위한 컴퓨팅장치는 도 15에 도시된 컴포넌트들 외에도, 터치스크린이나 센서 등을 더 포함할 수도 있으며, 상기 통신 회로(1160)에 다양한 통신방식(Wi-Fi, 3G, LTE, 5G, 6G, Bluetooth, NFC, Zigbee 등)의 RF 통신을 위한 회로가 포함될 수도 있다. 상기 컴퓨팅장치(11000)에 포함 가능한 컴포넌트들은 하나 이상의 신호 처리 또는 어플리케이션에 특화된 집적 회로를 포함하는 하드웨어, 소프트웨어, 또는 하드웨어 및 소프트웨어 양자의 조합으로 구현될 수 있다.This embodiment of FIG. 15 is only an example of the computing device (11000), and the computing device (11000) may have some of the components illustrated in FIG. 15 omitted, may further include additional components not illustrated in FIG. 15, or may have a configuration or arrangement that combines two or more components. For example, a computing device for a communication terminal in a mobile environment may further include a touchscreen or a sensor, in addition to the components illustrated in FIG. 15, and the communication circuit (1160) may include a circuit for RF communication of various communication methods (Wi-Fi, 3G, LTE, 5G, 6G, Bluetooth, NFC, Zigbee, etc.). The components that can be included in the computing device (11000) may be implemented as hardware including one or more signal processing or application-specific integrated circuits, software, or a combination of both hardware and software.

본 발명의 실시예에 따른 방법들은 다양한 컴퓨팅장치를 통하여 수행될 수 있는 프로그램 명령(instruction) 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 특히, 본 실시예에 따른 프로그램은 PC 기반의 프로그램 또는 모바일 단말 전용의 어플리케이션으로 구성될 수 있다. 본 발명이 적용되는 어플리케이션은 파일 배포 시스템이 제공하는 파일을 통해 이용자 단말에 설치될 수 있다. 일 예로, 파일 배포 시스템은 이용자 단말이기의 요청에 따라 상기 파일을 전송하는 파일 전송부(미도시)를 포함할 수 있다.The methods according to the embodiments of the present invention may be implemented in the form of program instructions that can be executed through various computing devices and recorded on a computer-readable medium. In particular, the program according to the embodiments may be configured as a PC-based program or an application exclusively for mobile terminals. The application to which the present invention is applied may be installed on a user terminal through a file provided by a file distribution system. For example, the file distribution system may include a file transmission unit (not shown) that transmits the file according to a request from a user terminal.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 컨트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The devices described above may be implemented as hardware components, software components, and/or a combination of hardware components and software components. For example, the devices and components described in the embodiments may be implemented using one or more general-purpose computers or special-purpose computers, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing instructions and responding to them. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to the execution of the software. For ease of understanding, the processing device is sometimes described as being used alone, but those skilled in the art will appreciate that the processing device may include multiple processing elements and/or multiple types of processing elements. For example, a processing device may include multiple processors, or a processor and a controller. Other processing configurations, such as parallel processors, are also possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로 (collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨팅장치 상에 표준편차되어서,표준편차된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing device to perform a desired operation or may independently or collectively command the processing device. The software and/or data may be permanently or temporarily embodied in any type of machine, component, physical device, virtual equipment, computer storage medium or device, or transmitted signal wave, for interpretation by the processing device or for providing instructions or data to the processing device. The software may be standardized and stored or executed in a standardized manner on a network-connected computing device. The software and data may be stored on one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program commands that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program commands, data files, data structures, etc., alone or in combination. The program commands recorded on the medium may be those specially designed and configured for the embodiment or may be those known to and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, and hardware devices specially configured to store and execute program commands such as ROMs, RAMs, flash memories, etc. Examples of the program commands include not only machine language codes generated by a compiler but also high-level language codes that can be executed by a computer using an interpreter, etc. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiment, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다. 그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Although the embodiments have been described above by way of limited examples and drawings, those skilled in the art can make various modifications and variations from the above description. For example, appropriate results can be achieved even if the described techniques are performed in a different order from the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form from the described method, or are replaced or substituted by other components or equivalents. Therefore, other implementations, other embodiments, and equivalents to the claims also fall within the scope of the following claims.

Claims

A method for automatically producing video content based on user input performed in a computing system including one or more processors and one or more memories,
A data automatic generation step for extracting a title, a plurality of subtitles and a plurality of subtexts based on user text input from a user terminal and generating one or more generated images related to the plurality of subtitles and the plurality of subtexts;
A slide automatic generation step for generating a plurality of slides including the title, the subtitle, and the generated image; and
A method for automatically producing video content based on user input, comprising: an AI human video generation step of generating an AI human video including an AI human uttering the subtext and the slide, and providing an editing interface to a user terminal.

In claim 1,
The above AI human image generation step is,
A layout setting step that provides a layout adjustment interface to the user terminal to adjust the size and position of the title, subtitle, and generated image included in the AI human video;
An AI human character setting step in which information is set for an AI human character to be implemented as multiple images according to the user's input; and
A method for automatically producing video content based on user input, comprising: an editing interface providing step for providing an editing interface for creating AI human video using a user terminal;

In claim 2,
The above layout adjustment interface is,
A title layer that displays the title for the entire video;
A subtitle layer that displays subtitles for each slide;
A generated image layer that displays generated images automatically placed on each slide;
AI human layer showing AI human uttering subtext of each slide; and
Includes a subtitle layer that displays the above subtext as a subtitle of the AI human image;
A method for automatically producing video content based on user input, wherein the position and size of a title layer, a subtitle layer, a generated image layer, an AI human layer, and a subtitle layer can be adjusted through the above layout adjustment interface.

In claim 1,
The above data automatic generation step is,
A user text receiving step for receiving user text related to the title or topic of an AI human image from a user terminal;
A subtext extraction step for extracting a title, multiple subtitles, and multiple subtexts to be included in the slide of the AI human image by inputting the above user text into a large language model inside or outside the service server; and
A method for automatically producing video content based on user input, comprising: an image generation step of inputting each of the plurality of subtitles and the plurality of subtexts into a deep learning-based image generation model inside or outside a service server, and generating one or more generated images corresponding to each of the plurality of subtitles and the plurality of subtexts.

In claim 1,
The above slide auto-generation step is,
Providing a plurality of layout templates in which the title, the subtitle, and the generated image are arranged in different layouts to the user terminal,
A method for automatically producing video content based on user input, which generates a plurality of slides by arranging the title, the subtitle, and the generated image in the form of a layout template selected from the user terminal.

In claim 1,
The above subtext is a script for the AI human to speak.
The above editing interface is,
A preview layer that provides a preview of the generated AI human image; and
A script layer that provides editing functions for the above subtext;
The above script layer is,
A plurality of subtext layers corresponding to each of the subtexts, each of which is displayed and provides an editing function for the subtext;
A method for automatically producing video content based on user input, wherein, in the subtext layer above, when the subtext included in the subtext layer is converted into voice using TTS, the voice playback time is displayed.

In claim 6,
The above subtext layer can display a gesture input layer,
The above gesture input layer is,
Includes a gesture setting interface that can set the type of gesture and the expression location of the AI human character in the AI human image for the subtext included in the corresponding subtext layer.
In the case where the gesture type and expression location for the subtext included in the corresponding subtext layer are set by the above gesture input layer,
In the summary block for that subtext,
A method for automatically producing video content based on user input, in which information about a corresponding gesture is overlaid and displayed on the detailed location of a summary block corresponding to a set expression location.

In claim 6,
The above preview layer is,
A preview playback layer that displays icons related to playback actions, including playing and stopping AI human images; and
Includes a time series summary layer that summarizes and displays information according to the time series of the AI human image displayed in the above preview playback layer;
The above time series summary layer is,
A method for automatically producing video content based on user input, comprising a timeline layer in which the viewpoint of a video of a preview playback layer moves according to movement of an included manipulation axis element.

In claim 1,
The above editing interface is,
An AI human character selection layer that can select the number and model of AI human characters in the above AI human image; and
It further includes an AI human character setting layer that can set at least one of the style, angle, pose, size, and position for each AI human character;
The above AI human character setting layer is,
A method for producing automatic video content based on user input, which can be displayed according to selection input of an AI human character displayed in an AI human video played in a preview playback layer, or selection input in a summary block corresponding to a subtext.