CN118474274B

CN118474274B - Short video generation method and system based on artificial intelligence

Info

Publication number: CN118474274B
Application number: CN202410929865.XA
Authority: CN
Inventors: 刘庆秋; 刘武丰; 刘咏梅; 刘婵咏; 占义芳
Original assignee: Guangzhou Hand In Hand Internet Co ltd
Current assignee: Guangzhou Hand In Hand Internet Co ltd
Priority date: 2024-07-11
Filing date: 2024-07-11
Publication date: 2024-10-29
Anticipated expiration: 2044-07-11
Also published as: CN118474274A

Abstract

The embodiment of the application provides a short video generation method and a system based on artificial intelligence, wherein the method comprises the following steps: acquiring a main video, an auxiliary video and a splicing area selected by a target object, extracting time t1 of the main video and time t2 of the auxiliary video, identifying the main video through a preset LSTM model to determine main audio information corresponding to the main video, identifying a first auxiliary extension video through the preset LSTM model to determine auxiliary audio information corresponding to the first auxiliary extension video, and shielding the volume of the auxiliary audio information of the first auxiliary extension video to obtain a second auxiliary extension video; extracting a first size of a main area of the splicing area, extracting a second size of an auxiliary area of the splicing area, adjusting the size of the second auxiliary extension video to the second size, adding the second extension video to the auxiliary area, aligning the starting time of the main area video and the starting time of the auxiliary area video, and then calling a video generation command to synthesize the main area video and the auxiliary area video into a short video.

Description

Short video generation method and system based on artificial intelligence

技术领域Technical Field

本申请属于通信以及视频技术领域，具体涉及一种基于人工智能的短视频生成方法及系统。The present application belongs to the field of communication and video technology, and specifically relates to a short video generation method and system based on artificial intelligence.

背景技术Background Art

短视频即短片视频,是一种互联网内容传播方式,一般是在互联网新媒体上传播的时长在10分钟以内的视频；随着短视频平台的发展，很多网购通过短视频的方式来推广，现有的短视频的推广一般为真人拍摄视频以后，由专业的剪辑人员进行剪辑来生成短视频。Short video is a short video, which is a way of disseminating Internet content. It is generally a video with a duration of less than 10 minutes that is disseminated on new Internet media. With the development of short video platforms, many online shopping platforms are promoted through short videos. The existing short video promotion is generally done by real people shooting videos and then editing them by professional editors to generate short videos.

拼接短视频由两个视频拼接形成，一般具有主视频和辅视频，但是对于主视频和辅视频来说，两个视频的时间可能不一致，另外，两个视频的大小也可以无法直接拼接在一起，现有的短视频拼接基于用户手动调整上述方案，但是手动调整视频的效果基于用户的视频拼接水平，这样导致拼接短视频的效果不太好。A spliced short video is formed by splicing two videos, generally including a main video and a secondary video. However, the time of the main video and the secondary video may be inconsistent. In addition, the sizes of the two videos may not be able to be directly spliced together. The existing short video splicing is based on the user manually adjusting the above scheme, but the effect of manually adjusting the video is based on the user's video splicing level, which results in the effect of the spliced short video not being very good.

发明内容Summary of the invention

本申请提供了一种基于人工智能的短视频生成方法及系统，能够自动生成拼接视频，无需手动调整时长以及大小，提高了短视频的效果。The present application provides an artificial intelligence-based short video generation method and system, which can automatically generate spliced videos without the need to manually adjust the length and size, thereby improving the effect of the short video.

第一方面，本申请提供了一种基于人工智能的短视频生成方法，所述方法包括如下步骤：In a first aspect, the present application provides a short video generation method based on artificial intelligence, the method comprising the following steps:

获取目标对象选择的主视频、辅视频和拼接区域，提取主视频的时间t1和辅视频的时间t2，若t1＞t2，则将辅视频的拷贝t1-t2时间得到第一辅延长视频，第一辅延长视频的时长为t1；若t1＜t2，则将辅视频剪接去掉t2－t1时间得到第一辅延长视频；Obtain the main video, auxiliary video and splicing area selected by the target object, extract the time t1 of the main video and the time t2 of the auxiliary video, if t1>t2, copy the auxiliary video by t1-t2 time to obtain the first auxiliary extended video, the duration of the first auxiliary extended video is t1; if t1<t2, cut the auxiliary video and remove the time t2-t1 to obtain the first auxiliary extended video;

通过预设的LSTM模型对主视频进行识别确定主视频对应的主音频信息，通过预设的LSTM模型对第一辅延长视频进行识别确定第一辅延长视频对应的辅音频信息，屏蔽所述第一辅延长视频的辅音频信息的音量得到第二辅延长视频；The main audio information corresponding to the main video is determined by identifying the main video through a preset LSTM model, the auxiliary audio information corresponding to the first auxiliary extended video is determined by identifying the first auxiliary extended video through a preset LSTM model, and the volume of the auxiliary audio information of the first auxiliary extended video is shielded to obtain the second auxiliary extended video;

提取拼接区域的主区域的第一尺寸，将所述主视频的尺寸调整至第一尺寸后添加至主区域，提取拼接区域的辅区域的第二尺寸，将所述第二辅延长视频的尺寸调整至第二尺寸后添加至辅区域，将主区域视频和辅区域视频的起始时间对齐后，调取视频生成命令将主区域视频和辅区域视频合成一个短视频。Extract the first size of the main area of the stitching area, adjust the size of the main video to the first size and add it to the main area, extract the second size of the auxiliary area of the stitching area, adjust the size of the second auxiliary extended video to the second size and add it to the auxiliary area, align the start time of the main area video and the auxiliary area video, and call the video generation command to combine the main area video and the auxiliary area video into a short video.

第二方面，提供一种基于人工智能的短视频生成系统，所述系统包括：In a second aspect, a short video generation system based on artificial intelligence is provided, the system comprising:

获取单元，用于获取目标对象选择的主视频、辅视频和拼接区域，提取主视频的时间t1和辅视频的时间t2；An acquisition unit, used to acquire the main video, auxiliary video and splicing area selected by the target object, and extract the time t1 of the main video and the time t2 of the auxiliary video;

调整单元，用于若t1＞t2，则将辅视频的拷贝t1-t2时间得到第一辅延长视频，第一辅延长视频的时长为t1；若t1＜t2，则将辅视频剪接去掉t2－t1时间得到第一辅延长视频；The adjusting unit is used for, if t1>t2, copying the auxiliary video by t1-t2 time to obtain a first auxiliary extended video, the duration of the first auxiliary extended video being t1; if t1<t2, editing the auxiliary video to remove t2-t1 time to obtain the first auxiliary extended video;

识别单元，用于通过预设的LSTM模型对主视频进行识别确定主视频对应的主音频信息，通过预设的LSTM模型对第一辅延长视频进行识别确定第一辅延长视频对应的辅音频信息，屏蔽所述第一辅延长视频的辅音频信息的音量得到第二辅延长视频；an identification unit, configured to identify the main video through a preset LSTM model to determine the main audio information corresponding to the main video, identify the first auxiliary extended video through a preset LSTM model to determine the auxiliary audio information corresponding to the first auxiliary extended video, and shield the volume of the auxiliary audio information of the first auxiliary extended video to obtain the second auxiliary extended video;

拼接生成单元，用于提取拼接区域的主区域的第一尺寸，将所述主视频的尺寸调整至第一尺寸后添加至主区域，提取拼接区域的辅区域的第二尺寸，将所述第二辅延长视频的尺寸调整至第二尺寸后添加至辅区域，将主区域视频和辅区域视频的起始时间对齐后，调取视频生成命令将主区域视频和辅区域视频合成一个短视频。The stitching generation unit is used to extract the first size of the main area of the stitching area, adjust the size of the main video to the first size and add it to the main area, extract the second size of the auxiliary area of the stitching area, adjust the size of the second auxiliary extended video to the second size and add it to the auxiliary area, align the start time of the main area video and the auxiliary area video, and then call the video generation command to combine the main area video and the auxiliary area video into a short video.

第三方面，本申请提供了一种计算机存储介质，存储用于电子数据交换的计算机程序，其中，所述计算机程序使得计算机执行如本申请第一方面所描述的部分或全部步骤。In a third aspect, the present application provides a computer storage medium storing a computer program for electronic data exchange, wherein the computer program enables a computer to execute some or all of the steps described in the first aspect of the present application.

本申请实施例具有以下有益效果：The embodiments of the present application have the following beneficial effects:

本申请提供的技术方案获取目标对象选择的主视频、辅视频和拼接区域，提取主视频的时间t1和辅视频的时间t2，若t1＞t2，则将辅视频的拷贝t1-t2时间得到第一辅延长视频，第一辅延长视频的时长为t1；若t1＜t2，则将辅视频剪接去掉t2－t1时间得到第一辅延长视频；通过预设的LSTM模型对主视频进行识别确定主视频对应的主音频信息，通过预设的LSTM模型对第一辅延长视频进行识别确定第一辅延长视频对应的辅音频信息，屏蔽所述第一辅延长视频的辅音频信息的音量得到第二辅延长视频；提取拼接区域的主区域的第一尺寸，将所述主视频的尺寸调整至第一尺寸后添加至主区域，提取拼接区域的辅区域的第二尺寸，将所述第二辅延长视频的尺寸调整至第二尺寸后添加至辅区域，将主区域视频和辅区域视频的起始时间对齐后，调取视频生成命令将主区域视频和辅区域视频合成一个短视频。这样生成的视频能够保证在视频上，主视频和辅视频的连贯性，并且在生成的视频的音频数据仅仅只包含了主视频的音频数据，不会出现音频数据重叠播放影响音频播放的流畅度的情况，因此其具有提高拼接视频的视频效果的优点。The technical solution provided in the present application obtains the main video, auxiliary video and splicing area selected by the target object, extracts the time t1 of the main video and the time t2 of the auxiliary video, and if t1>t2, copies the auxiliary video by time t1-t2 to obtain a first auxiliary extended video, the duration of the first auxiliary extended video is t1; if t1<t2, the auxiliary video is spliced to remove time t2-t1 to obtain the first auxiliary extended video; the main video is identified by a preset LSTM model to determine the main audio information corresponding to the main video, the first auxiliary extended video is identified by a preset LSTM model to determine the auxiliary audio information corresponding to the first auxiliary extended video, and the volume of the auxiliary audio information of the first auxiliary extended video is shielded to obtain the second auxiliary extended video; the first size of the main area of the splicing area is extracted, the size of the main video is adjusted to the first size and then added to the main area, the second size of the auxiliary area of the splicing area is extracted, the size of the second auxiliary extended video is adjusted to the second size and then added to the auxiliary area, and after aligning the start time of the main area video and the auxiliary area video, the video generation command is called to synthesize the main area video and the auxiliary area video into a short video. The video generated in this way can ensure the continuity of the main video and the auxiliary video, and the audio data of the generated video only contains the audio data of the main video. There will be no overlapping playback of audio data affecting the smoothness of audio playback. Therefore, it has the advantage of improving the video effect of the spliced video.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1是本申请实施例提供的一种电子设备的结构示意图；FIG1 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application;

图2是本申请实施提供的一种拼接区域的技术场景示意图；FIG2 is a schematic diagram of a technical scenario of a splicing area provided by the implementation of the present application;

图3是本申请实施例一提供的一种基于人工智能的短视频生成方法的流程示意图；FIG3 is a flow chart of a method for generating short videos based on artificial intelligence provided in Example 1 of the present application;

图4是本申请实施例提供的一种基于人工智能的短视频生成系统的结构示意图；FIG4 is a schematic diagram of the structure of an artificial intelligence-based short video generation system provided in an embodiment of the present application;

图5是本申请提供的手部轮廓示意图。FIG. 5 is a schematic diagram of a hand outline provided in the present application.

具体实施方式DETAILED DESCRIPTION

为了使本技术领域的人员更好地理解本申请方案，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.

本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象，而不是用于描述特定顺序。此外，术语“包括”和“具有”以及它们任何变形，意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元，而是可选地还包括没有列出的步骤或单元，或可选地还包括对于这些过程、方法、系统、产品或设备固有的其他步骤或单元。The terms "first", "second", etc. in the specification and claims of this application and the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific order. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes steps or units that are not listed, or optionally includes other steps or units inherent to these processes, methods, systems, products or devices.

在本文中提及“实施例”意味着，结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是，本文所描述的实施例可以与其它实施例相结合。Reference to "embodiments" herein means that a particular feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various locations in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment that is mutually exclusive with other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

下面介绍本申请实施例涉及电子设备的结构示意图。The following is a schematic diagram of the structure of an electronic device involved in an embodiment of the present application.

本申请还提供了一种电子设备，如图1所示，电子设备包括至少一个处理器（processor）11以及存储器（memory）12，还可以包括通信接口（CommunicationsInterface）14、1个或多个摄像头15、显示屏16和总线13。其中，处理器11、存储器12、摄像头15、显示屏16和通信接口14可以通过总线13完成相互间的通信。通信接口14可以传输信息，上述通信接口14可以具有无线通信功能，上述无线通信功能可以为近距离无线通信或远距离通信功能（例如LTE或NR方式）。处理器11可以调用存储器12中的逻辑指令，以执行或支持本申请实施例中的方法。上述远程终端20还可以包括：麦克风、照明设备等等。The present application also provides an electronic device, as shown in FIG1 , the electronic device includes at least one processor (processor) 11 and a memory (memory) 12, and may also include a communications interface (CommunicationsInterface) 14, one or more cameras 15, a display screen 16 and a bus 13. Among them, the processor 11, the memory 12, the camera 15, the display screen 16 and the communication interface 14 can communicate with each other through the bus 13. The communication interface 14 can transmit information, and the above-mentioned communication interface 14 can have a wireless communication function, and the above-mentioned wireless communication function can be a short-range wireless communication or a long-range communication function (such as LTE or NR mode). The processor 11 can call the logic instructions in the memory 12 to execute or support the method in the embodiment of the present application. The above-mentioned remote terminal 20 may also include: a microphone, a lighting device, and the like.

存储器12作为一种计算机可读存储介质，可设置为存储软件程序、计算机可执行程序，如本公开实施例中的方法对应的程序指令或模块。处理器11通过运行存储在存储器12中的软件程序、指令或模块，从而执行功能应用以及数据处理，即实现或支持本申请实施例中的方法。The memory 12, as a computer-readable storage medium, can be configured to store software programs, computer executable programs, such as program instructions or modules corresponding to the method in the embodiment of the present disclosure. The processor 11 executes functional applications and data processing by running the software programs, instructions or modules stored in the memory 12, that is, implementing or supporting the method in the embodiment of the present application.

下面介绍本申请实施例主要的技术应用场景，如图2所示，图2为本申请实施例的一种拼接区域的技术场景示意图，图2中的视频以方框表示，具体包括：The following introduces the main technical application scenarios of the embodiments of the present application, as shown in FIG2 , which is a schematic diagram of a technical scenario of a splicing area of the embodiment of the present application, and the video in FIG2 is represented by a box, specifically including:

拼接区域201、主视频202和辅视频203，对于主视频202来说，其要位于拼接区域201的主区域2011位置，对于辅视频203来说，其可以位于拼接区域201的辅区域2012位置，但是在实际处理中，主视频202和辅视频203的时长不同，例如主视频202的时长为3分钟，辅视频的时长为4分钟或2分钟，辅视频203的尺寸也与辅区域位置的尺寸不一致，因此需要智能的调整对应的时间和尺寸，进而提高拼接视频的显示效果。Stitching area 201, main video 202 and auxiliary video 203. For the main video 202, it should be located in the main area 2011 of the stitching area 201. For the auxiliary video 203, it can be located in the auxiliary area 2012 of the stitching area 201. However, in actual processing, the main video 202 and the auxiliary video 203 have different durations. For example, the main video 202 has a duration of 3 minutes, and the auxiliary video has a duration of 4 minutes or 2 minutes. The size of the auxiliary video 203 is also inconsistent with the size of the auxiliary area. Therefore, it is necessary to intelligently adjust the corresponding time and size to improve the display effect of the stitching video.

下面对具体的方法进行详细的介绍。The specific methods are introduced in detail below.

实施例一Embodiment 1

本申请实施例一提供一种基于人工智能的短视频生成方法，上述方法在如图1所示的电子设备执行，图3为本申请实施例一提供的一种基于人工智能的短视频生成方法的流程示意图，如图3所示，上述方法包括如下步骤：Embodiment 1 of the present application provides a method for generating a short video based on artificial intelligence. The method is executed in the electronic device shown in FIG1 . FIG3 is a flow chart of a method for generating a short video based on artificial intelligence provided in Embodiment 1 of the present application. As shown in FIG3 , the method includes the following steps:

步骤S301、获取目标对象选择的主视频、辅视频和拼接区域，提取主视频的时间t1和辅视频的时间t2，若t1＞t2，则将辅视频的拷贝t1-t2时间得到第一辅延长视频，第一辅延长视频的时长为t1；若t1＜t2，则将辅视频剪接去掉t2－t1时间得到第一辅延长视频；Step S301, obtaining the main video, auxiliary video and splicing area selected by the target object, extracting the time t1 of the main video and the time t2 of the auxiliary video, if t1>t2, copying the auxiliary video by the time t1-t2 to obtain a first auxiliary extended video, the duration of the first auxiliary extended video is t1; if t1<t2, editing the auxiliary video to remove the time t2-t1 to obtain the first auxiliary extended video;

示例的，上述将辅视频的拷贝t1-t2时间得到第一辅延长视频具体可以包括：For example, the copying of the auxiliary video for the time period t1-t2 to obtain the first auxiliary extended video may specifically include:

计算x= t1/t2，对x向上取整得到x’；将x’－1拷贝辅视频添加至辅视频的尾部得到x’个辅视频，将x’个辅视频剪接去掉x’－t1时间得到第一辅延长视频。Calculate x = t1/t2, round x upward to get x’; add x’-1 copies of the auxiliary video to the end of the auxiliary video to get x’ auxiliary videos, and cut the x’ auxiliary videos to remove the x’-t1 time to get the first auxiliary extended video.

上述方案将辅视频和主视频的时长调整完全一致，这样能够方便后续生成视频的连贯性，避免了主视频播放完毕后，辅视频还在播放的情况，或者辅视频播放完毕后，主视频还在播放的情况出现。The above scheme adjusts the duration of the auxiliary video and the main video to be completely consistent, which can facilitate the continuity of the subsequent generated videos and avoid the situation where the auxiliary video is still playing after the main video is finished playing, or the main video is still playing after the auxiliary video is finished playing.

步骤S302、通过预设的LSTM模型对主视频进行识别确定主视频对应的主音频信息，通过预设的LSTM模型对第一辅延长视频进行识别确定第一辅延长视频对应的辅音频信息，屏蔽所述第一辅延长视频的辅音频信息的音量得到第二辅延长视频；Step S302: identifying the main video through a preset LSTM model to determine the main audio information corresponding to the main video, identifying the first auxiliary extended video through a preset LSTM model to determine the auxiliary audio information corresponding to the first auxiliary extended video, and shielding the volume of the auxiliary audio information of the first auxiliary extended video to obtain the second auxiliary extended video;

上述LSTM模型可以为通用的长短记忆模型，上述识别的方式也可以采用现有的方式识别，本申请并不限制上述LSTM模型的具体实现方式，这里不再赘述。The above-mentioned LSTM model can be a general long short-term memory model, and the above-mentioned identification method can also be identified using existing methods. This application does not limit the specific implementation method of the above-mentioned LSTM model, which will not be repeated here.

步骤S303、提取拼接区域的主区域的第一尺寸，将所述主视频的尺寸调整至第一尺寸后添加至主区域，提取拼接区域的辅区域的第二尺寸，将所述第二辅延长视频的尺寸调整至第二尺寸后添加至辅区域，将主区域视频和辅区域视频的起始时间对齐后，调取视频生成命令将主区域视频和辅区域视频合成一个短视频。Step S303, extract the first size of the main area of the stitching area, adjust the size of the main video to the first size and add it to the main area, extract the second size of the auxiliary area of the stitching area, adjust the size of the second auxiliary extended video to the second size and add it to the auxiliary area, align the start time of the main area video and the auxiliary area video, and call the video generation command to combine the main area video and the auxiliary area video into a short video.

示例的，上述将所述第二辅延长视频的尺寸调整至第二尺寸后添加至辅区域具体可以包括：For example, adjusting the size of the second auxiliary extended video to the second size and then adding it to the auxiliary area may specifically include:

提取第二辅延长视频的尺寸的第一高度值与第一宽度值，提取第二尺寸的第二高度值和第二宽度值，计算第一高度值与第二高度值之间的比值得到高度比值，计算第一宽度值与第二宽度值的比值得到宽度比值；依据高度比值和宽度比值确定第二辅延长视频的剪接策略对第二辅延长视频剪接后得到第三辅延长视频，对第三辅延长视频的尺寸调整值第二尺寸后添加至辅区域。Extract the first height value and the first width value of the size of the second auxiliary extended video, extract the second height value and the second width value of the second size, calculate the ratio between the first height value and the second height value to obtain the height ratio, and calculate the ratio between the first width value and the second width value to obtain the width ratio; determine the editing strategy of the second auxiliary extended video according to the height ratio and the width ratio, and obtain the third auxiliary extended video after editing the second auxiliary extended video, adjust the size of the third auxiliary extended video to the second size, and then add it to the auxiliary area.

示例的，上述依据高度比值和宽度比值确定第二辅延长视频的剪接策略对第二辅延长视频剪接后得到第三辅延长视频具体可以包括：For example, the above-mentioned determining the splicing strategy of the second auxiliary extended video based on the height ratio and the width ratio to obtain the third auxiliary extended video after splicing the second auxiliary extended video may specifically include:

若高度比值大于高度阈值且宽度比值小于宽度阈值，对第二辅延长视频在高度方向剪切以减少第一高度值且保持第一宽度值不变；If the height ratio is greater than the height threshold and the width ratio is less than the width threshold, the second auxiliary extended video is cut in the height direction to reduce the first height value and keep the first width value unchanged;

若高度比值小于高度阈值且宽度比值大于宽度阈值，对第二辅延长视频在宽度方向剪切以减少第一宽度值且保持第一高度值不变；If the height ratio is less than the height threshold and the width ratio is greater than the width threshold, the second auxiliary extended video is cut in the width direction to reduce the first width value and keep the first height value unchanged;

若高度比值大于高度阈值且宽度比值大于宽度阈值，对第二辅延长视频在高度方向剪切以减少第一高度值，并对第二辅延长视频在宽度方向剪切以减少第一宽度值。If the height ratio is greater than the height threshold and the width ratio is greater than the width threshold, the second auxiliary extended video is cropped in a height direction to reduce the first height value, and the second auxiliary extended video is cropped in a width direction to reduce the first width value.

示例的，上述对第二辅延长视频在高度方向剪切以减少第一高度值具体可以包括：For example, the above-mentioned cutting of the second auxiliary extended video in the height direction to reduce the first height value may specifically include:

调用神经网络模型对第二辅延长视频的区域进行识别确定第二辅延长视频的背景区域和活动区域，在高度方向减少背景区域的尺寸以减少第一高度值。The neural network model is called to identify the area of the second auxiliary extended video to determine the background area and the activity area of the second auxiliary extended video, and the size of the background area is reduced in the height direction to reduce the first height value.

示例的，上述对第二辅延长视频在宽度方向剪切以减少第一宽度值且保持第一高度值不变具体可以包括：For example, the above-mentioned cutting of the second auxiliary extended video in the width direction to reduce the first width value and keep the first height value unchanged may specifically include:

调用神经网络模型对第二辅延长视频的区域进行识别确定第二辅延长视频的背景区域和活动区域，在宽度方向减少背景区域的尺寸以减少第一高度值。The neural network model is called to identify the area of the second auxiliary extended video to determine the background area and the active area of the second auxiliary extended video, and the size of the background area is reduced in the width direction to reduce the first height value.

示例的，上述对第二辅延长视频在高度方向剪切以减少第一高度值，并对第二辅延长视频在宽度方向剪切以减少第一宽度值具体可以包括：For example, the above-mentioned cutting the second auxiliary extended video in the height direction to reduce the first height value, and cutting the second auxiliary extended video in the width direction to reduce the first width value may specifically include:

调用神经网络模型对第二辅延长视频的区域进行识别确定第二辅延长视频的背景区域和活动区域，在宽度方向和高度方向减少背景区域的尺寸以减少第一宽度值和第一高度值。A neural network model is called to identify the area of the second auxiliary extended video to determine the background area and the active area of the second auxiliary extended video, and the size of the background area is reduced in the width direction and the height direction to reduce the first width value and the first height value.

在一些拼接的场景中，上述主视频可能需要更换人员，例如用户的肖像权的问题，随着AI技术的发展，AI生成图像图片已经非常的成熟，但是AI直接视频的生成中无法对原始的主视频中的一些动作进行还原，例如手部动作或手臂的动作等等无法还原，这对于一些特殊应用场景，例如，运动服（例如舞蹈服、瑜伽服等等）的推广中产生一定的影响，因此需要对此进行改进。In some splicing scenarios, the above-mentioned main video may need to change people, such as the issue of user's portrait rights. With the development of AI technology, AI-generated images have become very mature, but the generation of AI direct video cannot restore some actions in the original main video, such as hand movements or arm movements, etc. This has a certain impact on some special application scenarios, for example, the promotion of sportswear (such as dance clothes, yoga clothes, etc.), so it needs to be improved.

示例的，上述方法还可以包括：For example, the above method may further include:

基于AI模型生成一个数字人，生成数字人的第一视频，依据主视频的手部动作对第一视频内的手部动作进行调整后得到第二视频，采用第二视频对主视频替换添加至主视频区域，将主区域视频和辅区域视频的起始时间对齐后，调取视频生成命令将主区域视频和辅区域视频合成新的短视频。A digital human is generated based on the AI model, and a first video of the digital human is generated. The hand movements in the first video are adjusted according to the hand movements in the main video to obtain a second video. The second video is used to replace the main video and added to the main video area. After the start time of the main area video and the auxiliary area video are aligned, the video generation command is called to synthesize the main area video and the auxiliary area video into a new short video.

上述生成数字人以及数字人的第一视频可以采用现有的AI模型生成，本申请并不限制上述生成的具体方式，上述第一视频也是与主视频相同应用场景下的视频，例如主视频为跳舞的视频，第一视频也应该是跳舞场景的视频。本申请主要是对第一视频的手部动作进行调整以提高第一视频的效果。The generated digital human and the first video of the digital human can be generated by using an existing AI model. This application does not limit the specific method of the above generation. The first video is also a video in the same application scenario as the main video. For example, if the main video is a dancing video, the first video should also be a video of a dancing scene. This application mainly adjusts the hand movements of the first video to improve the effect of the first video.

示例的，上述依据主视频的手部动作对第一视频内的手部动作进行调整后得到第二视频具体可以包括：For example, the above-mentioned adjusting the hand movements in the first video according to the hand movements in the main video to obtain the second video may specifically include:

对主视频的第一帧的手部轮廓进行识别得到第一手部轮廓图片，将第一手部轮廓图片输入到神经网络模型识别出第一手部轮廓图片中的n个特征点，从n个特征点选择一个特征点作为原点o1（一般将n个特征点最边缘的点作为原点o1，如图5所示的o1点所示），以原点o1为基准计算n-1个特征点对应的n-1个复数值，构建n－1个复数值与n－1个特征点之间的一一映射得到第一映射表，在第一视频中提取与第一帧对应的第二帧，对第二帧的手部轮廓进行识别得到第二手部轮廓图片，将第二手部轮廓图片输入到神经网络模型识别出第二手部轮廓图片中的n’个特征点，从n’个特征点选择一个特征点作为原点o2，以原点o2为基准计算n’－1个特征点对应的n’－1个复数值，分别计算n’－1个复数值与n－1个复数值的差值得到n－1个差，若n－1个差均在阈值范围内，从n-1个差中选择小于第一阈值的x个差值，将第二手部轮廓图片中x个差值对应的x个特征点固定不动，从n-1-x个特征点中选择与x个特征点中第i特征点相邻的第j特征点，将第j特征点以第i特征点为基准进行旋转使得旋转后的第j’特征点的复数值与第一映射表对应的复数值的差值小于第一阈值后，停止旋转并将旋转后的第j’特征点固定，遍历n-1-x个特征点使得第二手部轮廓图片的所有特征点的复数值与第一映射表内的复数值的差值均小于第一阈值；遍历第一视频中所有帧得到跳帧过后的第二视频。The hand contour of the first frame of the main video is recognized to obtain a first hand contour image, the first hand contour image is input into the neural network model to recognize n feature points in the first hand contour image, one feature point is selected from the n feature points as the origin o1 (generally, the point at the edge of the n feature points is taken as the origin o1, as shown by point o1 in Figure 5), the n-1 complex values corresponding to the n-1 feature points are calculated based on the origin o1, a one-to-one mapping between the n-1 complex values and the n-1 feature points is constructed to obtain a first mapping table, the second frame corresponding to the first frame is extracted from the first video, the hand contour of the second frame is recognized to obtain a second hand contour image, the second hand contour image is input into the neural network model to recognize n' feature points in the second hand contour image, one feature point is selected from the n' feature points as the origin o2, and the n'-1 complex values are calculated based on the origin o2. The method comprises the following steps: selecting n'-1 complex values corresponding to the feature points, calculating the difference between the n'-1 complex value and the n-1 complex value to obtain n-1 differences, if the n-1 differences are all within the threshold range, selecting x differences less than the first threshold from the n-1 differences, fixing the x feature points corresponding to the x differences in the second hand contour image, selecting the j-th feature point adjacent to the i-th feature point among the x feature points from the n-1-x feature points, rotating the j-th feature point based on the i-th feature point so that the difference between the complex value of the rotated j'th feature point and the complex value corresponding to the first mapping table is less than the first threshold, stopping the rotation and fixing the rotated j'th feature point, traversing the n-1-x feature points so that the difference between the complex values of all feature points in the second hand contour image and the complex values in the first mapping table is less than the first threshold; traversing all frames in the first video to obtain the second video after frame skipping.

示例的，上述n具体可以为大于22的数值，因为对于手指的关节为15个关节，在加上5个手指的最远点和手腕的边缘的2个位置，因此可以构建至少22个特征点，当然还可以细分更细节的特征点，这些特征点的具体确认方式可以有厂家自行确定，只需要包含上述15个关节点即可。For example, the above n can be a value greater than 22, because there are 15 joints for the fingers, plus the farthest points of the five fingers and two positions on the edge of the wrist, so at least 22 feature points can be constructed. Of course, more detailed feature points can be subdivided. The specific confirmation method of these feature points can be determined by the manufacturer, and it only needs to include the above 15 joint points.

上述调整的原理为，由于每个手指的关节均可以转动，其转动会形成不同的动作，因为生成的第一视频与主视频手动动作大部分都是相同的，因此只用细微的调整，在此前提下，对主视频的第一帧的手指轮廓中的特征点在坐标系的所有复数值标记出来，即能够得到手部的具体动作，在原理上，只要将第一视频对应的第二帧的手指轮廓中的特征点调整至与第一帧的一致，就能够实现对手部的具体动作的还原，因为两帧图片中本来就有些特征点的位置是一样的，因此这些位置对应的特征点不用调整，并且以这些位置为基准，对相邻的特征点进行调整即能够达到调整手部视频的目的。The principle of the above adjustment is that since the joints of each finger can rotate, their rotation will form different movements. Since the manual movements of the generated first video and the main video are mostly the same, only slight adjustments are required. Under this premise, all complex values of the feature points in the finger contour of the first frame of the main video are marked in the coordinate system, so that the specific movements of the hand can be obtained. In principle, as long as the feature points in the finger contour of the second frame corresponding to the first video are adjusted to be consistent with the first frame, the specific movements of the hand can be restored. Because the positions of some feature points in the two frames are the same, the feature points corresponding to these positions do not need to be adjusted, and based on these positions, the purpose of adjusting the hand video can be achieved by adjusting the adjacent feature points.

对合成一个短视频的语音信息替换成合成语音，合成语音的生成方法具体包括：The voice information of a synthesized short video is replaced with a synthesized voice, and the method for generating the synthesized voice specifically includes:

调用语言模型对语音信息进行识别得到该语音信息的文本信息，将文本信息进行分段处理得到多段文本，对语音信息进行时间识别得到多段文本对应的多个时间段，构建多个时间段与多段文本为一一对应的关系的第一映射，对每段文本进行声纹识别得到多段文本对应的多个声纹特征信息，构建多个声纹特征信息与多段文本为一一对应的关系的第二映射；将多个声纹特征信息中相同的声纹特征信息进行编号得到α个声纹特征信息，对α个声纹特征信息进行编号得到α个编号，提取第二映射中的第一段文本和对应的第一声纹特征信息，将第一声纹特征信息的编号标记在第一段文本，遍历第二映射所有段文本和对应的所有声纹特征信息得到具有编号标记的多段标记文本，采集目标对象的原始声纹特征信息，构建α个声纹系数，将原始声纹特征信息和α个声纹系统构建α个构建声纹特征信息，构建α个构建声纹特征信息与该α个编号的一一对应关系的第三映射，提取第一段文本以及编号β₁；依据第三映射得到编号β₁对应的第β₁个构建声纹特征信息，采用第β₁个构建声纹特征信息和第一段文本生成第一段合成语音，依据第一映射提取第一段文本对应的第一时间段，采用第一段合成语音替换短视频的语音信息中第一时间段的语音数据，遍历所有段文本的都合成语音。Call the language model to recognize the voice information to obtain the text information of the voice information, segment the text information to obtain multiple paragraphs of text, perform time recognition on the voice information to obtain multiple time periods corresponding to the multiple paragraphs of text, construct a first mapping in which the multiple time periods correspond to the multiple paragraphs of text, perform voiceprint recognition on each paragraph of text to obtain multiple voiceprint feature information corresponding to the multiple paragraphs of text, and construct a second mapping in which the multiple voiceprint feature information corresponds to the multiple paragraphs of text; number the same voiceprint feature information in the multiple voiceprint feature information to obtain α voiceprint feature information, number the α voiceprint feature information to obtain α numbers, extract the first paragraph of text and the corresponding first voiceprint feature information in the second mapping, mark the number of the first voiceprint feature information in the first paragraph of text, traverse all paragraphs of text and all corresponding voiceprint feature information in the second mapping to obtain multiple paragraphs of marked text with number marks, collect the original voiceprint feature information of the target object, construct α voiceprint coefficients, construct α constructed voiceprint feature information from the original voiceprint feature information and α voiceprint systems, construct a third mapping in which the α constructed voiceprint feature information corresponds to the α numbers, extract the first paragraph of text and the number β ₁ ; according to the third mapping, obtain the _β1th constructed voiceprint feature information corresponding to the number _β1 , use the _β1th constructed voiceprint feature information and the first paragraph of text to generate a first synthetic speech, extract the first time period corresponding to the first paragraph of text according to the first mapping, use the first synthetic speech to replace the voice data of the first time period in the voice information of the short video, and traverse all the paragraphs of text to synthesize speech.

上述α为大于等于1整数，上述1≤β₁≤α。The above-mentioned α is an integer greater than or equal to 1, and the above-mentioned 1≤β ₁ ≤α.

示例的，上述语言模型对语音信息进行识别的方式可以采用现有的语音识别方式，上述技术方案对语音合成以及识别的具体方案并不限定，可以采用通用的AI数据大模型来实现，这里不一一赘述。For example, the language model can recognize voice information using existing voice recognition methods. The technical solution does not limit the specific methods of voice synthesis and recognition, and can be implemented using a general AI data model, which will not be described in detail here.

上述构建α个声纹系数的方式具体可以包括：The above method of constructing α voiceprint coefficients may specifically include:

依据幂函数提取α个特征点来得到α个声纹系数；Extract α feature points according to the power function to obtain α voiceprint coefficients;

上述幂函数f(b)=b^m；其中，b为自变量，b的取值为α个声纹系数的编号值，上述﹣1≤b≤0.5；上述b的具体取值可以由目标对象自行选择。The power function f(b)=b ^m ; wherein b is an independent variable, and the value of b is the number value of α voiceprint coefficients, and the above -1≤b≤0.5; the specific value of b can be selected by the target object.

上述语音的替换主要解决的是不同的人说话声纹不同，但是替换的语音采用的都是统一的语音描述，这样对于视频的体验感有差异，因此本申请通过构建自定义的声纹特征系数来构建多个系数，进而对声纹特征进行改变得到多个语音特征信息，然后与原始语音信息对应的段匹配，达到相同的人使用同个声纹特征信息，并且不同的人的声纹信息不同，进而提高语音信息的效果，进而提高合成视频的效果。The above-mentioned voice replacement mainly solves the problem that different people have different voiceprints when speaking, but the replaced voices all use a unified voice description, which leads to differences in the video experience. Therefore, this application constructs multiple coefficients by constructing custom voiceprint feature coefficients, and then changes the voiceprint features to obtain multiple voice feature information, which are then matched with the segments corresponding to the original voice information, so that the same person uses the same voiceprint feature information, and different people have different voiceprint information, thereby improving the effect of the voice information and thus improving the effect of the synthesized video.

参阅图4，图4为本申请提供的一种基于人工智能的短视频生成系统的结构示意图，所述系统包括：Referring to FIG. 4 , FIG. 4 is a schematic diagram of the structure of a short video generation system based on artificial intelligence provided by the present application, wherein the system comprises:

获取单元401，用于获取目标对象选择的主视频、辅视频和拼接区域，提取主视频的时间t1和辅视频的时间t2；An acquisition unit 401 is used to acquire the main video, auxiliary video and splicing area selected by the target object, and extract the time t1 of the main video and the time t2 of the auxiliary video;

调整单元402，用于若t1＞t2，则将辅视频的拷贝t1-t2时间得到第一辅延长视频，第一辅延长视频的时长为t1；若t1＜t2，则将辅视频剪接去掉t2－t1时间得到第一辅延长视频；The adjusting unit 402 is used for, if t1>t2, copying the auxiliary video by t1-t2 time to obtain a first auxiliary extended video, the duration of the first auxiliary extended video being t1; if t1<t2, editing the auxiliary video to remove t2-t1 time to obtain the first auxiliary extended video;

识别单元403，用于通过预设的LSTM模型对主视频进行识别确定主视频对应的主音频信息，通过预设的LSTM模型对第一辅延长视频进行识别确定第一辅延长视频对应的辅音频信息，屏蔽所述第一辅延长视频的辅音频信息的音量得到第二辅延长视频；The identification unit 403 is used to identify the main video through a preset LSTM model to determine the main audio information corresponding to the main video, identify the first auxiliary extended video through a preset LSTM model to determine the auxiliary audio information corresponding to the first auxiliary extended video, and shield the volume of the auxiliary audio information of the first auxiliary extended video to obtain the second auxiliary extended video;

拼接生成单元404，用于提取拼接区域的主区域的第一尺寸，将所述主视频的尺寸调整至第一尺寸后添加至主区域，提取拼接区域的辅区域的第二尺寸，将所述第二辅延长视频的尺寸调整至第二尺寸后添加至辅区域，将主区域视频和辅区域视频的起始时间对齐后，调取视频生成命令将主区域视频和辅区域视频合成一个短视频。The stitching generation unit 404 is used to extract the first size of the main area of the stitching area, adjust the size of the main video to the first size and add it to the main area, extract the second size of the auxiliary area of the stitching area, adjust the size of the second auxiliary extended video to the second size and add it to the auxiliary area, align the start time of the main area video and the auxiliary area video, and then call the video generation command to combine the main area video and the auxiliary area video into a short video.

调整单元402，具体用于计算x= t1/t2，对x向上取整得到x’；将x’－1拷贝辅视频添加至辅视频的尾部得到x’个辅视频，将x’个辅视频剪接去掉x’－t1时间得到第一辅延长视频。The adjustment unit 402 is specifically used to calculate x = t1/t2, round x upward to obtain x'; add x'-1 copied auxiliary videos to the end of the auxiliary video to obtain x' auxiliary videos, and cut the x' auxiliary videos to remove the x'-t1 time to obtain the first auxiliary extended video.

拼接生成单元404，具体用于提取第二辅延长视频的尺寸的第一高度值与第一宽度值，提取第二尺寸的第二高度值和第二宽度值，计算第一高度值与第二高度值之间的比值得到高度比值，计算第一宽度值与第二宽度值的比值得到宽度比值；依据高度比值和宽度比值确定第二辅延长视频的剪接策略对第二辅延长视频剪接后得到第三辅延长视频，对第三辅延长视频的尺寸调整值第二尺寸后添加至辅区域。The splicing generation unit 404 is specifically used to extract the first height value and the first width value of the size of the second auxiliary extended video, extract the second height value and the second width value of the second size, calculate the ratio between the first height value and the second height value to obtain the height ratio, and calculate the ratio between the first width value and the second width value to obtain the width ratio; determine the splicing strategy of the second auxiliary extended video according to the height ratio and the width ratio to obtain the third auxiliary extended video after splicing the second auxiliary extended video, adjust the size of the third auxiliary extended video to the second size, and then add it to the auxiliary area.

拼接生成单元404，具体用于若高度比值大于高度阈值且宽度比值小于宽度阈值，对第二辅延长视频在高度方向剪切以减少第一高度值且保持第一宽度值不变；The splicing generation unit 404 is specifically configured to cut the second auxiliary extended video in the height direction to reduce the first height value and keep the first width value unchanged if the height ratio is greater than the height threshold and the width ratio is less than the width threshold;

拼接生成单元404，具体用于调用神经网络模型对第二辅延长视频的区域进行识别确定第二辅延长视频的背景区域和活动区域，在高度方向减少背景区域的尺寸以减少第一高度值；The splicing generation unit 404 is specifically configured to call a neural network model to identify the area of the second auxiliary extended video to determine the background area and the activity area of the second auxiliary extended video, and reduce the size of the background area in the height direction to reduce the first height value;

调用神经网络模型对第二辅延长视频的区域进行识别确定第二辅延长视频的背景区域和活动区域，在宽度方向减少背景区域的尺寸以减少第一高度值；Calling a neural network model to identify the area of the second auxiliary extended video to determine the background area and the activity area of the second auxiliary extended video, and reducing the size of the background area in the width direction to reduce the first height value;

本申请实施例可以根据上述方法示例对电子设备进行功能单元的划分，例如，可以对应各个功能划分各个功能单元，也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。需要说明的是，本申请实施例中对单元的划分是示意性的，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式。The embodiment of the present application can divide the electronic device into functional units according to the above method example. For example, each functional unit can be divided according to each function, or two or more functions can be integrated into one processing unit. The above integrated unit can be implemented in the form of hardware or in the form of software functional units. It should be noted that the division of units in the embodiment of the present application is schematic and is only a logical function division. There may be other division methods in actual implementation.

上述实施例，可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时，上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时，全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程系统。所述计算机指令可以存储在计算机可读存储介质中，或者从一个计算机可读存储介质向另一个计算机可读存储介质传输，例如，所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质（例如，软盘、硬盘、磁带）、光介质（例如，DVD）、或者半导体介质。半导体介质可以是固态硬盘。The above embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination thereof. When implemented using software, the above embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable system. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from one website, computer, server or data center to another website, computer, server or data center by wired or wireless means. The computer-readable storage medium may be any available medium that a computer can access or a data storage device such as a server or data center that contains one or more available media sets. The available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium. The semiconductor medium may be a solid-state hard disk.

本申请实施例还提供一种计算机存储介质，其中，该计算机存储介质存储用于电子数据交换的计算机程序，该计算机程序使得计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤，上述计算机包括电子设备。An embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, wherein the computer program enables a computer to execute part or all of the steps of any method described in the above method embodiments, and the above computer includes an electronic device.

本申请实施例还提供一种计算机程序产品，上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质，上述计算机程序可操作来使计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤。该计算机程序产品可以为一个软件安装包，上述计算机包括电子设备。The embodiment of the present application also provides a computer program product, the computer program product includes a non-transitory computer-readable storage medium storing a computer program, the computer program is operable to cause a computer to execute some or all of the steps of any method described in the method embodiment. The computer program product may be a software installation package, and the computer includes an electronic device.

应理解，在本申请的各种实施例中，上述各过程的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should be understood that in the various embodiments of the present application, the size of the serial numbers of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

在本申请所提供的几个实施例中，应该理解到，所揭露的方法、系统和系统，可以通过其它的方式实现。例如，以上所描述的系统实施例仅仅是示意性的；例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式；例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，系统或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed methods, systems and systems can be implemented in other ways. For example, the system embodiments described above are only schematic; for example, the division of the units is only a logical function division, and there may be other division methods in actual implementation; for example, multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of systems or units, which can be electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理包括，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may be physically included separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of hardware plus software functional units.

上述以软件功能单元的形式实现的集成的单元，可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中，包括若干指令用以使得一台计算机设备（可以是个人计算机，服务器，或者网络设备等）执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括：U盘、移动硬盘、磁碟、光盘、易失性存储器或非易失性存储器。其中，非易失性存储器可以是只读存储器（read-only memory，ROM）、可编程只读存储器（programmable ROM，PROM）、可擦除可编程只读存储器（erasable PROM，EPROM）、电可擦除可编程只读存储器（electrically EPROM，EEPROM）或闪存。易失性存储器可以是随机存取存储器（random access memory，RAM），其用作外部高速缓存。通过示例性但不是限制性说明，许多形式的随机存取存储器（random access memory，RAM）可用，例如静态随机存取存储器（static RAM，SRAM）、动态随机存取存储器（DRAM）、同步动态随机存取存储器（synchronous DRAM，SDRAM）、双倍数据速率同步动态随机存取存储器（double data rateSDRAM，DDR SDRAM）、增强型同步动态随机存取存储器（enhanced SDRAM，ESDRAM）、同步连接动态随机存取存储器（synchlink DRAM，SLDRAM）和直接内存总线随机存取存储器（directrambus RAM，DR RAM）。等各种可以存储程序代码的介质。The above-mentioned integrated unit implemented in the form of a software functional unit can be stored in a computer-readable storage medium. The above-mentioned software functional unit is stored in a storage medium, including a number of instructions for a computer device (which can be a personal computer, a server, or a network device, etc.) to perform some steps of the method described in each embodiment of the present invention. The aforementioned storage medium includes: a USB flash drive, a mobile hard disk, a magnetic disk, an optical disk, a volatile memory or a non-volatile memory. Among them, the non-volatile memory can be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM) or a flash memory. The volatile memory can be a random access memory (RAM), which is used as an external cache. By way of example but not limitation, many forms of random access memory (RAM) are available, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous link DRAM (SLDRAM) and direct RAM bus RAM (DR RAM). Various media that can store program code are available.

虽然本发明披露如上，但本发明并非限定于此。任何本领域技术人员，在不脱离本发明的精神和范围内，可轻易想到变化或替换，均可作各种更动与修改，包含上述不同功能、实施步骤的组合，包含软件和硬件的实施方式，均在本发明的保护范围。Although the present invention is disclosed as above, the present invention is not limited thereto. Any person skilled in the art can easily think of changes or substitutions without departing from the spirit and scope of the present invention, and can make various changes and modifications, including the combination of the above-mentioned different functions and implementation steps, including software and hardware implementation methods, all of which are within the scope of protection of the present invention.

Claims

1. A short video generation method based on artificial intelligence, the method comprising the steps of:

acquiring a main video, an auxiliary video and a splicing area selected by a target object, extracting time t1 of the main video and time t2 of the auxiliary video, and if t1 is more than t2, obtaining a first auxiliary extension video from the time t1-t2 of the auxiliary video, wherein the duration of the first auxiliary extension video is t1; if t1 is less than t2, splicing the auxiliary video to remove t2-t1 time to obtain a first auxiliary extension video;

Identifying main audio information corresponding to the main video through a preset LSTM model, identifying auxiliary audio information corresponding to the first auxiliary extension video through the preset LSTM model, and shielding the volume of the auxiliary audio information of the first auxiliary extension video to obtain a second auxiliary extension video;

extracting a first size of a main area of a splicing area, adjusting the size of the main video to the first size, adding the first size to the main area, extracting a second size of an auxiliary area of the splicing area, adjusting the size of a second auxiliary extension video to the second size, adding the second extension video to the auxiliary area, aligning the starting time of the main area video and the starting time of the auxiliary area video, and then calling a video generation command to synthesize the main area video and the auxiliary area video into a short video;

Generating a digital person based on an AI model, generating a first video of the digital person, adjusting the hand motion in the first video according to the hand motion of the main video to obtain a second video, replacing the main video with the second video, adding the main video to a main video area, aligning the starting time of the main area video and the starting time of the auxiliary area video, and calling a video generation command to synthesize the main area video and the auxiliary area video into a new short video;

the step of obtaining the second video after the hand motion in the first video is adjusted according to the hand motion of the main video specifically comprises the following steps:

Identifying the hand contour of a first frame of a main video to obtain a first hand contour picture, inputting the first hand contour picture into a neural network model to identify n characteristic points in the first hand contour picture, selecting one characteristic point from the n characteristic points as an origin o1, calculating n-1 complex values corresponding to the n-1 characteristic points based on the origin o1, constructing a first mapping table by one-to-one mapping between n-1 complex values and the n-1 characteristic points, extracting a second frame corresponding to the first frame in the first video, identifying the hand contour of the second frame to obtain a second hand contour picture, inputting the second hand contour picture into the neural network model to identify n ' characteristic points in the second hand contour picture, selecting one characteristic point from the n ' characteristic points as an origin o2, calculating n ' -1 complex values corresponding to the n ' -1 characteristic points based on the origin o2, respectively calculating differences between the n ' -1 complex values and the n-1 complex values, if the n-1 differences are smaller than the first characteristic points, rotating the first characteristic points by a fixed threshold value x, and then rotating the first characteristic points from the first characteristic points by a fixed value x, and then rotating the first characteristic points by a threshold value x, and the first characteristic points of the first characteristic points are fixed in a fixed value x-difference value, traversing n-1-x feature points to enable the difference value between complex values of all feature points of the second hand outline picture and complex values in the first mapping table to be smaller than a first threshold value; and traversing all frames in the first video to obtain a second video after frame skipping.

2. The short video generating method based on artificial intelligence according to claim 1, wherein the obtaining the first auxiliary extension video from the time t1-t2 of the copies of the auxiliary video specifically comprises:

Calculating x=t1/t 2, and rounding up x to obtain x'; and adding the x '-1 copy auxiliary video to the tail part of the auxiliary video to obtain x' auxiliary videos, and splicing the x 'auxiliary videos to remove x' -t1 time to obtain a first auxiliary extension video.

3. The short video generation method based on artificial intelligence according to claim 1, wherein the step of adding the second auxiliary extension video to the auxiliary area after the second auxiliary extension video is resized to the second size specifically comprises:

Extracting a first height value and a first width value of the size of the second auxiliary extension video, extracting a second height value and a second width value of the second size, calculating the ratio between the first height value and the second height value to obtain a height ratio, and calculating the ratio between the first width value and the second width value to obtain a width ratio; and determining a splicing strategy of the second auxiliary extension video according to the height ratio and the width ratio, splicing the second auxiliary extension video to obtain a third auxiliary extension video, and adding the second size of the size adjustment value of the third auxiliary extension video to the auxiliary area.

4. The short video generating method based on artificial intelligence according to claim 3, wherein the determining the splicing strategy of the second auxiliary extension video according to the height ratio and the width ratio to splice the second auxiliary extension video to obtain the third auxiliary extension video specifically comprises:

if the height ratio is greater than the height threshold and the width ratio is less than the width threshold, shearing the second auxiliary extension video in the height direction to reduce the first height value and keep the first width value unchanged;

if the height ratio is smaller than the height threshold value and the width ratio is larger than the width threshold value, shearing the second auxiliary extension video in the width direction to reduce the first width value and keep the first height value unchanged;

If the height ratio is greater than the height threshold and the width ratio is greater than the width threshold, cropping the second auxiliary extension video in the height direction to reduce the first height value and cropping the second auxiliary extension video in the width direction to reduce the first width value.

5. The artificial intelligence based short video generation method according to claim 4, wherein the cropping the second auxiliary extension video in the height direction to reduce the first height value specifically comprises:

invoking a neural network model to identify the region of the second auxiliary extension video, determining a background region and an active region of the second auxiliary extension video, and reducing the size of the background region in the height direction so as to reduce the first height value;

The cropping the second auxiliary extension video in the width direction to reduce the first width value and keep the first height value unchanged specifically includes:

Invoking a neural network model to identify the region of the second auxiliary extension video, determining a background region and an active region of the second auxiliary extension video, and reducing the size of the background region in the width direction so as to reduce the first height value;

The step of cutting the second auxiliary extension video in the height direction to reduce the first height value, and cutting the second auxiliary extension video in the width direction to reduce the first width value specifically includes:

And calling a neural network model to identify the region of the second auxiliary extension video, determining the background region and the active region of the second auxiliary extension video, and reducing the size of the background region in the width direction and the height direction so as to reduce the first width value and the first height value.

6. An artificial intelligence based short video generation apparatus, the apparatus comprising:

the acquisition unit is used for acquiring the main video, the auxiliary video and the splicing area selected by the target object, and extracting the time t1 of the main video and the time t2 of the auxiliary video;

the adjusting unit is used for obtaining a first auxiliary extension video from the time t1-t2 of the copy of the auxiliary video if t1 is more than t2, wherein the duration of the first auxiliary extension video is t1; if t1 is less than t2, splicing the auxiliary video to remove t2-t1 time to obtain a first auxiliary extension video;

The identification unit is used for identifying and determining main audio information corresponding to the main video through a preset LSTM model, identifying and determining auxiliary audio information corresponding to the first auxiliary extension video through the preset LSTM model, and shielding the volume of the auxiliary audio information of the first auxiliary extension video to obtain a second auxiliary extension video;

The splicing generation unit is used for extracting a first size of a main area of the splicing area, adjusting the size of the main video to the first size, adding the first size to the main area, extracting a second size of an auxiliary area of the splicing area, adjusting the size of the second auxiliary extension video to the second size, adding the second extension video to the auxiliary area, aligning the starting time of the main area video and the starting time of the auxiliary area video, and calling a video generation command to synthesize the main area video and the auxiliary area video into a short video;

the device is used for generating a digital person based on an AI model, generating a first video of the digital person, adjusting the hand motion in the first video according to the hand motion of the main video to obtain a second video, replacing the main video by the second video, adding the second video to a main video area, aligning the starting time of the main area video and the starting time of an auxiliary area video, and calling a video generation command to synthesize the main area video and the auxiliary area video into a new short video;

7. The artificial intelligence based short video generating apparatus according to claim 6, wherein,

The adjusting unit is specifically configured to calculate x=t1/t 2, and round x upward to obtain x'; and adding the x '-1 copy auxiliary video to the tail part of the auxiliary video to obtain x' auxiliary videos, and splicing the x 'auxiliary videos to remove x' -t1 time to obtain a first auxiliary extension video.

8. The artificial intelligence based short video generating apparatus according to claim 6, wherein,

The splicing generation unit is specifically used for extracting a first height value and a first width value of the size of the second auxiliary extension video, extracting a second height value and a second width value of the second size, calculating the ratio between the first height value and the second height value to obtain a height ratio, and calculating the ratio between the first width value and the second width value to obtain a width ratio; and determining a splicing strategy of the second auxiliary extension video according to the height ratio and the width ratio, splicing the second auxiliary extension video to obtain a third auxiliary extension video, and adding the second size of the size adjustment value of the third auxiliary extension video to the auxiliary area.

9. The artificial intelligence based short video generating apparatus according to claim 8, wherein,

The splicing generation unit is specifically configured to cut the second auxiliary extension video in the height direction to reduce the first height value and keep the first width value unchanged if the height ratio is greater than the height threshold and the width ratio is less than the width threshold;

10. The artificial intelligence based short video generating apparatus according to claim 9, wherein,

The splicing generation unit is specifically used for calling the neural network model to identify the region of the second auxiliary extension video, determining the background region and the active region of the second auxiliary extension video, and reducing the size of the background region in the height direction so as to reduce the first height value;