CN115695684A

CN115695684A - Multimedia data editing method and device, storage medium and electronic device

Info

Publication number: CN115695684A
Application number: CN202211196957.9A
Authority: CN
Inventors: 胡江明
Original assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd; Haier Uplus Intelligent Technology Beijing Co Ltd
Current assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd; Haier Uplus Intelligent Technology Beijing Co Ltd
Priority date: 2022-09-28
Filing date: 2022-09-28
Publication date: 2023-02-03

Abstract

The application discloses a multimedia data editing method, device, storage medium and electronic device, and relates to the field of smart home. The multimedia data editing method includes: extracting an initial image from the initial multimedia data collected from the scene object in the target scene Features and initial sound features; identify the target emotional intensity of the scene object according to the initial image features and initial sound features; when the target emotional intensity is greater than or equal to the target intensity threshold, intercept the target moment from the initial multimedia data. The target multimedia data, wherein the target moment is the moment corresponding to the identified multimedia data whose target emotional intensity is greater than or equal to the target intensity threshold. The problem of low efficiency in editing the multimedia data of the wonderful moment in the related technology is solved, and the technical effect of improving the efficiency of editing the multimedia data of the wonderful moment is realized.

Description

Multimedia data editing method, device, storage medium and electronic device

技术领域technical field

本申请涉及智能家居领域，具体而言，涉及一种多媒体数据的编辑方法、装置、存储介质及电子装置。The present application relates to the field of smart home, in particular, to a multimedia data editing method, device, storage medium and electronic device.

背景技术Background technique

现有技术中，在记录日常生活中的精彩瞬间的时候，往往是需要用户主动通过手机、摄像机等等方式进行拍摄记录，这样的记录方式，一方面可能会因为用户未来得及拍摄，导致错过记录精彩瞬间的多媒体数据；另一方面，这样的方式是在被拍摄人主观知道的意识下进行拍摄，拍摄的效果都是“刻意”而为之，可能并不能拍摄到用户最自然情况下的精彩瞬间。In the existing technology, when recording the wonderful moments in daily life, it is often necessary for the user to take the initiative to shoot and record through mobile phones, cameras, etc., such a recording method, on the one hand, may cause the user to miss the record because the user will not have time to shoot in the future Multimedia data of exciting moments; on the other hand, this method is to shoot under the subjective consciousness of the person being photographed, and the shooting effect is "deliberately" and may not be able to capture the excitement of the user in the most natural situation moment.

针对相关技术中，编辑精彩瞬间的多媒体数据的效率较低等问题，尚未提出有效的解决方案。Aiming at the problems in related technologies, such as low efficiency of editing the multimedia data of wonderful moments, no effective solution has been proposed yet.

发明内容Contents of the invention

本申请实施例提供了一种多媒体数据的编辑方法、装置、存储介质及电子装置，以至少解决相关技术中，编辑精彩瞬间的多媒体数据的效率较低等问题。Embodiments of the present application provide a multimedia data editing method, device, storage medium, and electronic device, so as to at least solve the problems in related technologies such as low efficiency of editing multimedia data of wonderful moments.

根据本申请的一个实施例，提供了一种多媒体数据的编辑方法，包括：从对目标场景中的场景对象采集的初始多媒体数据中提取初始图像特征和初始声音特征，其中，所述初始图像特征用于通过所述场景对象上目标对象部位的运动表征所述场景对象所具有的情绪，所述初始声音特征用于通过所述场景对象所发出的声音表征所述场景对象所具有的情绪；According to one embodiment of the present application, a method for editing multimedia data is provided, including: extracting initial image features and initial sound features from initial multimedia data collected from scene objects in a target scene, wherein the initial image features It is used to represent the emotion of the scene object through the movement of the target object part on the scene object, and the initial sound feature is used to represent the emotion of the scene object through the sound emitted by the scene object;

根据所述初始图像特征和所述初始声音特征识别所述场景对象的目标情绪强度，其中，所述目标情绪强度用于指示所述场景对象所具有的情绪的剧烈程度；Identifying the target emotional intensity of the scene object according to the initial image feature and the initial sound feature, wherein the target emotional intensity is used to indicate the intensity of the emotion of the scene object;

在所述目标情绪强度大于或者等于目标强度阈值的情况下，从所述初始多媒体数据中截取目标时刻所在目标时间段内的目标多媒体数据，其中，所述目标时刻是识别出所述目标情绪强度大于或者等于所述目标强度阈值的多媒体数据所对应的时刻。In the case where the target emotional strength is greater than or equal to the target strength threshold, intercept the target multimedia data within the target time period where the target moment is located from the initial multimedia data, wherein the target moment is when the target emotional strength is identified The moment corresponding to the multimedia data greater than or equal to the target intensity threshold.

可选的，所述根据所述初始图像特征和所述初始声音特征识别所述场景对象的目标情绪强度，包括：Optionally, the identifying the target emotional intensity of the scene object according to the initial image features and the initial sound features includes:

对所述初始图像特征所包括的初始表情特征和初始动作特征，以及所述初始声音特征进行特征融合，得到初始融合特征，其中，所述初始表情特征用于通过所述场景对象的面部活动表征所述场景对象所具有的情绪，所述初始动作特征用于通过所述场景对象的肢体活动表征所述场景对象所具有的情绪，所述初始融合特征中包括与所述场景对象所具有的情绪之间相关度大于目标阈值的特征；performing feature fusion on the initial expression features and initial action features included in the initial image features, and the initial sound features to obtain initial fusion features, wherein the initial expression features are used to represent the facial activity of the scene object The emotion possessed by the scene object, the initial action feature is used to characterize the emotion possessed by the scene object through the physical activity of the scene object, and the initial fusion feature includes the emotion possessed by the scene object The features whose correlation degree is greater than the target threshold;

确定与所述初始融合特征匹配的所述场景对象的所述目标情绪强度。Determining the target emotional intensity of the scene objects matching the initial fusion features.

可选的，所述确定与所述初始融合特征匹配的所述场景对象的所述目标情绪强度，包括：Optionally, the determining the target emotional intensity of the scene object matching the initial fusion feature includes:

确定与所述初始融合特征匹配的所述场景对象的目标表现剧烈度，其中，所述目标表现剧烈度用于指示所述场景对象外在表现的剧烈程度；determining the target performance severity of the scene object matched with the initial fusion feature, wherein the target performance severity is used to indicate the severity of the external performance of the scene object;

将所述目标表现剧烈度转换为所述目标情绪强度。The target expressive intensity is converted to the target emotional intensity.

将所述初始融合特征输入目标情绪识别模型，其中，所述目标情绪识别模型是使用标注了情绪强度标签的对象融合特征对初始情绪识别模型进行训练得到的；The initial fusion feature is input into the target emotion recognition model, wherein the target emotion recognition model is obtained by using the object fusion feature marked with the emotional strength label to train the initial emotion recognition model;

获取所述目标情绪识别模型输出的所述目标情绪强度。Acquiring the target emotion intensity output by the target emotion recognition model.

可选的，所述从所述初始多媒体数据中截取目标时刻所在目标时间段内的目标多媒体数据，包括以下之一：Optionally, the intercepting the target multimedia data within the target time period at the target moment from the initial multimedia data includes one of the following:

在所述初始多媒体数据中从第一时刻截取到所述目标时刻，得到所述目标多媒体数据，其中，所述第一时刻是所述目标时刻之前距离所述目标时刻为所述目标时间段的时刻；Intercepting from the first moment to the target moment in the initial multimedia data to obtain the target multimedia data, wherein the first moment is the target time period before the target moment from the target moment time;

在所述初始多媒体数据中从所述目标时刻截取到第二时刻，得到所述目标多媒体数据，其中，所述第二时刻是所述目标时刻之后距离所述目标时刻为所述目标时间段的时刻；Intercepting from the target time to a second time in the initial multimedia data to obtain the target multimedia data, wherein the second time is the distance from the target time after the target time to the target time period time;

在所述初始多媒体数据中从第三时刻截取到第四时刻，得到所述目标多媒体数据，其中，所述第三时刻距离所述第四时刻为所述目标时间段，所述目标时刻在所述第三时刻和所述第四时刻之间。The target multimedia data is obtained by intercepting from the third moment to the fourth moment in the initial multimedia data, wherein the distance between the third moment and the fourth moment is the target time period, and the target moment is within the between the third moment and the fourth moment.

可选的，在所述从所述初始多媒体数据中截取目标时刻所在目标时间段内的目标多媒体数据之后，所述方法还包括：Optionally, after the intercepting the target multimedia data within the target time period at the target moment from the initial multimedia data, the method further includes:

确定所述场景对象所对应的目标帐号，以及所述目标多媒体数据所对应的目标情绪类型，并从所述目标多媒体数据中获取所述目标情绪强度最高的目标数据帧；Determining the target account corresponding to the scene object and the target emotion type corresponding to the target multimedia data, and obtaining the target data frame with the highest intensity of the target emotion from the target multimedia data;

将以所述目标数据帧为展示封面的所述目标多媒体数据存储至所述目标帐号下所述目标情绪类型所对应的数据集合中。storing the target multimedia data with the target data frame as the display cover in the data set corresponding to the target emotion type under the target account.

接收数据调用请求，其中，所述数据调用请求用于请求调用展示了符合目标关键词的目标场景对象的多媒体数据；Receiving a data call request, wherein the data call request is used to request to call multimedia data showing a target scene object conforming to the target keyword;

响应所述数据调用请求，将所述目标关键词划分为场景属性和情绪属性；In response to the data call request, the target keywords are divided into scene attributes and emotional attributes;

从为所述目标场景对象存储的多媒体数据集合中筛选符合所述场景属性的候选多媒体数据；Screening candidate multimedia data conforming to the scene attribute from the multimedia data set stored for the target scene object;

从所述候选多媒体数据中筛选与所述情绪属性匹配的多媒体数据作为待调用多媒体数据；Screening multimedia data matching the emotional attribute from the candidate multimedia data as the multimedia data to be invoked;

展示所述待调用多媒体数据。Display the multimedia data to be called.

根据本申请的另一个实施例，还提供了一种多媒体数据的编辑装置，包括：According to another embodiment of the present application, a device for editing multimedia data is also provided, including:

提取模块，用于从对目标场景中的场景对象采集的初始多媒体数据中提取初始图像特征和初始声音特征，其中，所述初始图像特征用于通过所述场景对象上目标对象部位的运动表征所述场景对象所具有的情绪，所述初始声音特征用于通过所述场景对象所发出的声音表征所述场景对象所具有的情绪；An extraction module, configured to extract initial image features and initial sound features from the initial multimedia data collected from the scene object in the target scene, wherein the initial image features are used to represent the motion of the target object part on the scene object. The emotion that the scene object has, the initial sound feature is used to represent the emotion that the scene object has through the sound sent by the scene object;

识别模块，用于根据所述初始图像特征和所述初始声音特征识别所述场景对象的目标情绪强度，其中，所述目标情绪强度用于指示所述场景对象所具有的情绪的剧烈程度；An identification module, configured to identify the target emotional intensity of the scene object according to the initial image feature and the initial sound feature, wherein the target emotional intensity is used to indicate the intensity of the emotion of the scene object;

截取模块，用于在所述目标情绪强度大于或者等于目标强度阈值的情况下，从所述初始多媒体数据中截取目标时刻所在目标时间段内的目标多媒体数据，其中，所述目标时刻是识别出所述目标情绪强度大于或者等于所述目标强度阈值的多媒体数据所对应的时刻。An intercepting module, configured to intercept the target multimedia data within the target time period where the target moment is located from the initial multimedia data when the target emotional intensity is greater than or equal to the target intensity threshold, wherein the target moment is identified The moment corresponding to the multimedia data whose target emotional intensity is greater than or equal to the target intensity threshold.

根据本申请实施例的又一方面，还提供了一种计算机可读的存储介质，该计算机可读的存储介质中存储有计算机程序，其中，该计算机程序被设置为运行时执行上述多媒体数据的编辑方法。According to yet another aspect of the embodiments of the present application, a computer-readable storage medium is also provided, and a computer program is stored in the computer-readable storage medium, wherein the computer program is set to execute edit method.

根据本申请实施例的又一方面，还提供了一种电子装置，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，其中，上述处理器通过计算机程序执行上述的多媒体数据的编辑方法。According to yet another aspect of the embodiments of the present application, there is also provided an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the above-mentioned processor executes the above-mentioned A method for editing multimedia data.

在本申请实施例中，从对目标场景中的场景对象采集的初始多媒体数据中提取初始图像特征和初始声音特征，其中，初始图像特征用于通过场景对象上目标对象部位的运动表征场景对象所具有的情绪，初始声音特征用于通过场景对象所发出的声音表征场景对象所具有的情绪；根据初始图像特征和初始声音特征识别场景对象的目标情绪强度，其中，目标情绪强度用于指示场景对象所具有的情绪的剧烈程度；在目标情绪强度大于或者等于目标强度阈值的情况下，从初始多媒体数据中截取目标时刻所在目标时间段内的目标多媒体数据，其中，目标时刻是识别出目标情绪强度大于或者等于目标强度阈值的多媒体数据所对应的时刻，即可以根据从初始多媒体数据中提取到的初始图像特征和初始声音特征，识别场景对象的目标情绪强度，在目标情绪强度大于或者等于目标强度阈值的情况下，可以表明在目标时刻出现了场景对象的精彩瞬间，在这样的情况下，可以从初始多媒体数据中截取目标时刻的目标时间段内的目标多媒体数据，实现了结合场景对象的图像特征和声音特征，多维度识别场景对象的情绪强度，并可以在识别到场景对象的精彩瞬间的情况下，自动记录该精彩时刻的多媒体数据。采用上述技术方案，解决了相关技术中，编辑精彩瞬间的多媒体数据的效率较低等问题，实现了提升编辑精彩瞬间的多媒体数据的效率的技术效果。In the embodiment of the present application, the initial image features and initial sound features are extracted from the initial multimedia data collected from the scene objects in the target scene, wherein the initial image features are used to represent the scene objects through the motion of the target object parts on the scene objects. emotion, the initial sound feature is used to represent the emotion of the scene object through the sound emitted by the scene object; the target emotional intensity of the scene object is identified according to the initial image feature and the initial sound feature, wherein the target emotional intensity is used to indicate the scene object The intensity of the emotion possessed; when the target emotional intensity is greater than or equal to the target intensity threshold, the target multimedia data within the target time period where the target moment is located is intercepted from the initial multimedia data, wherein the target moment is the identification of the target emotional intensity At the moment corresponding to the multimedia data greater than or equal to the target intensity threshold, the target emotional intensity of the scene object can be identified according to the initial image features and initial sound features extracted from the initial multimedia data. When the target emotional intensity is greater than or equal to the target intensity In the case of the threshold value, it can be indicated that there is a wonderful moment of the scene object at the target moment. In this case, the target multimedia data in the target time period of the target moment can be intercepted from the initial multimedia data, and the image combined with the scene object can be realized. Features and sound features, multi-dimensional recognition of the emotional intensity of the scene object, and can automatically record the multimedia data of the wonderful moment when the wonderful moment of the scene object is recognized. By adopting the above technical solution, the problem of low efficiency in editing the multimedia data of the wonderful moment in the related art is solved, and the technical effect of improving the efficiency of editing the multimedia data of the wonderful moment is realized.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本申请的实施例，并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application.

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，对于本领域普通技术人员而言，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, for those of ordinary skill in the art, In other words, other drawings can also be obtained from these drawings on the premise of not paying creative work.

图1是根据本申请实施例的一种多媒体数据的编辑方法的硬件环境示意图；Fig. 1 is a schematic diagram of a hardware environment according to a method for editing multimedia data according to an embodiment of the present application;

图2是根据本申请实施例的一种多媒体数据的编辑方法的流程图；Fig. 2 is a flowchart of a method for editing multimedia data according to an embodiment of the present application;

图3是根据本申请实施例的一种存储目标多媒体数据的示意图；FIG. 3 is a schematic diagram of storing target multimedia data according to an embodiment of the present application;

图4是根据本申请实施例的一种注册场景对象的人脸的示意图；Fig. 4 is a schematic diagram of a registered face of a scene object according to an embodiment of the present application;

图5是根据本申请实施例的一种调用多媒体数据的示意图；FIG. 5 is a schematic diagram of calling multimedia data according to an embodiment of the present application;

图6是根据本申请实施例的一种多媒体数据的编辑方法的示意图；FIG. 6 is a schematic diagram of a method for editing multimedia data according to an embodiment of the present application;

图7是根据本申请实施例的一种多媒体数据的编辑装置的结构框图。Fig. 7 is a structural block diagram of an apparatus for editing multimedia data according to an embodiment of the present application.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本申请方案，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分的实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is an embodiment of a part of the application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the scope of protection of this application.

需要说明的是，本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first" and "second" in the description and claims of the present application and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.

根据本申请实施例的一个方面，提供了一种多媒体数据的编辑方法。该多媒体数据的编辑方法广泛应用于智慧家庭(Smart Home)、智能家居、智能家用设备生态、智慧住宅(Intelligence House)生态等全屋智能数字化控制应用场景。可选地，在本实施例中，上述多媒体数据的编辑方法可以应用于如图1所示的由终端设备102和服务器104所构成的硬件环境中。如图1所示，服务器104通过网络与终端设备102进行连接，可用于为终端或终端上安装的客户端提供服务(如应用服务等)，可在服务器上或独立于服务器设置数据库，用于为服务器104提供数据存储服务，可在服务器上或独立于服务器配置云计算和/或边缘计算服务，用于为服务器104提供数据运算服务。According to one aspect of the embodiments of the present application, a method for editing multimedia data is provided. The multimedia data editing method is widely used in whole-house intelligent digital control application scenarios such as smart home, smart home, smart household equipment ecology, and intelligence house ecology. Optionally, in this embodiment, the above multimedia data editing method may be applied to a hardware environment composed of a terminal device 102 and a server 104 as shown in FIG. 1 . As shown in Figure 1, the server 104 is connected to the terminal device 102 through the network, and can be used to provide services (such as application services, etc.) for the terminal or the client installed on the terminal. To provide data storage services for the server 104, cloud computing and/or edge computing services may be configured on the server or independently of the server to provide data computing services for the server 104.

上述网络可以包括但不限于以下至少之一：有线网络，无线网络。上述有线网络可以包括但不限于以下至少之一：广域网，城域网，局域网，上述无线网络可以包括但不限于以下至少之一：WIFI(Wireless Fidelity，无线保真)，蓝牙。终端设备102可以并不限定于为PC、手机、平板电脑、智能空调、智能烟机、智能冰箱、智能烤箱、智能炉灶、智能洗衣机、智能热水器、智能洗涤设备、智能洗碗机、智能投影设备、智能电视、智能晾衣架、智能窗帘、智能影音、智能插座、智能音响、智能音箱、智能新风设备、智能厨卫设备、智能卫浴设备、智能扫地机器人、智能擦窗机器人、智能拖地机器人、智能空气净化设备、智能蒸箱、智能微波炉、智能厨宝、智能净化器、智能饮水机、智能门锁等。The foregoing network may include but not limited to at least one of the following: a wired network and a wireless network. The above-mentioned wired network may include but not limited to at least one of the following: wide area network, metropolitan area network, and local area network, and the above-mentioned wireless network may include but not limited to at least one of the following: WIFI (Wireless Fidelity, Wireless Fidelity), Bluetooth. The terminal device 102 is not limited to PC, mobile phone, tablet computer, smart air conditioner, smart hood, smart refrigerator, smart oven, smart stove, smart washing machine, smart water heater, smart washing device, smart dishwasher, smart projection device , smart TV, smart drying rack, smart curtain, smart video, smart socket, smart audio, smart speaker, smart fresh air equipment, smart kitchen and bathroom equipment, smart bathroom equipment, smart sweeping robot, smart window cleaning robot, smart mopping robot, Smart air purification equipment, smart steamer, smart microwave oven, smart kitchen treasure, smart purifier, smart water dispenser, smart door lock, etc.

在本实施例中提供了一种多媒体数据的编辑方法，应用于上述终端设备，图2是根据本申请实施例的一种多媒体数据的编辑方法的流程图，如图2所示，该流程包括如下步骤：In this embodiment, a method for editing multimedia data is provided, which is applied to the above-mentioned terminal device. FIG. 2 is a flow chart of a method for editing multimedia data according to an embodiment of the present application. As shown in FIG. 2 , the process includes Follow the steps below:

步骤S202，从对目标场景中的场景对象采集的初始多媒体数据中提取初始图像特征和初始声音特征，其中，所述初始图像特征用于通过所述场景对象上目标对象部位的运动表征所述场景对象所具有的情绪，所述初始声音特征用于通过所述场景对象所发出的声音表征所述场景对象所具有的情绪；Step S202, extracting initial image features and initial sound features from the initial multimedia data collected from the scene objects in the target scene, wherein the initial image features are used to characterize the scene through the motion of the target object parts on the scene objects The emotion that the object has, and the initial sound feature is used to represent the emotion that the scene object has through the sound emitted by the scene object;

步骤S204，根据所述初始图像特征和所述初始声音特征识别所述场景对象的目标情绪强度，其中，所述目标情绪强度用于指示所述场景对象所具有的情绪的剧烈程度；Step S204, identifying the target emotional intensity of the scene object according to the initial image feature and the initial sound feature, wherein the target emotional intensity is used to indicate the intensity of the emotion of the scene object;

步骤S206，在所述目标情绪强度大于或者等于目标强度阈值的情况下，从所述初始多媒体数据中截取目标时刻所在目标时间段内的目标多媒体数据，其中，所述目标时刻是识别出所述目标情绪强度大于或者等于所述目标强度阈值的多媒体数据所对应的时刻。Step S206, when the target emotional intensity is greater than or equal to the target intensity threshold, intercept the target multimedia data within the target time period where the target moment is located from the initial multimedia data, wherein the target moment is when the target moment is identified The moment corresponding to the multimedia data whose target emotional intensity is greater than or equal to the target intensity threshold.

通过上述步骤，可以根据从初始多媒体数据中提取到的初始图像特征和初始声音特征，识别场景对象的目标情绪强度，在目标情绪强度大于或者等于目标强度阈值的情况下，可以表明在目标时刻出现了场景对象的精彩瞬间，在这样的情况下，可以从初始多媒体数据中截取目标时刻的目标时间段内的目标多媒体数据，实现了结合场景对象的图像特征和声音特征，多维度识别场景对象的情绪强度，并可以在识别到场景对象的精彩瞬间的情况下，自动记录该精彩时刻的多媒体数据。采用上述技术方案，解决了相关技术中，编辑精彩瞬间的多媒体数据的效率较低等问题，实现了提升编辑精彩瞬间的多媒体数据的效率的技术效果。Through the above steps, the target emotional intensity of the scene object can be identified according to the initial image features and initial sound features extracted from the initial multimedia data. When the target emotional intensity is greater than or equal to the target intensity threshold, it can be indicated that the target emotional intensity appears at the target moment. In this case, the target multimedia data within the target time period at the target moment can be intercepted from the initial multimedia data, realizing the multi-dimensional recognition of the scene object by combining the image features and sound features of the scene object emotional intensity, and can automatically record the multimedia data of the wonderful moment in the case of recognizing the wonderful moment of the scene object. By adopting the above technical solution, the problem of low efficiency in editing the multimedia data of the wonderful moment in the related art is solved, and the technical effect of improving the efficiency of editing the multimedia data of the wonderful moment is realized.

在上述步骤S202提供的技术方案中，可以但不限于从初始多媒体数据中提取表征场景对象的面部部位和肢体部位的运动的初始图像特征，以及表征场景对象所发出的声音的初始声音特征，实现了多维度考虑场景对象的面部运动、肢体运动以及场景对象所发出的声音等等因素对场景对象所具有的情绪的影响。In the technical solution provided in the above step S202, it is possible but not limited to extract the initial image features representing the movement of the facial parts and body parts of the scene object, and the initial sound features representing the sound emitted by the scene object from the initial multimedia data, to achieve The influence of factors such as facial movement, body movement, and the sound emitted by the scene object on the emotion of the scene object is considered in multiple dimensions.

可选地，在本实施例中，目标场景中可以但不限于包括了一个或者多个场景对象，目标场景可以但不限于包括家庭场景、办公场景或者运动场景等等，场景对象可以但不限于为在家庭场景中的一个或者多个家庭成员，或者在办公场景中的一名或者多名办公人员，或者在运动场景中包括的一个或者多名运动人员等等。Optionally, in this embodiment, the target scene may include, but is not limited to, one or more scene objects, and the target scene may include, but is not limited to, family scenes, office scenes, or sports scenes, etc., and the scene objects may, but not limited to One or more family members in a family scene, or one or more office workers in an office scene, or one or more sports personnel included in a sports scene, and so on.

可选地，在本实施例中，初始多媒体数据可以但不限于包括为目标场景记录的视频数据、图像数据和音频数据等等数据，可以但不限于通过智能音箱、智能电视、智能扫地机器人以及智能音响等等具有摄像功能和语音功能的智能设备实时检测目标场景中存在的场景对象，在检测到目标场景中存在场景对象并且场景对象发出声音以及存在肢体运动和面部运动的情况下，可以实时地从初始多媒体数据中提取初始图像特征和初始声音特征，提升了提取初始图像特征和初始声音特征的效率。Optionally, in this embodiment, the initial multimedia data may include, but is not limited to, data such as video data, image data, and audio data recorded for the target scene, and may, but is not limited to, be transmitted through smart speakers, smart TVs, smart sweeping robots, and Smart audio and other smart devices with camera function and voice function can detect scene objects in the target scene in real time. Extracting initial image features and initial sound features from initial multimedia data efficiently improves the efficiency of extracting initial image features and initial sound features.

可选地，在本实施例中，初始图像特征可以但不限于包括场景对象的面部运动所具有的特征(可以但不限于包括眉毛的运动距离、嘴角的上扬距离以及嘴角的下撇距离等等)和场景对象的肢体运动所具有的特征(可以但不限于包括腿的上抬幅度、手臂的挥舞幅度以及身体的弯曲角度等等)，初始声音特征可以但不限于包括场景对象所发出的声音所具有的振幅、频率等等特征。Optionally, in this embodiment, the initial image features may include, but are not limited to, the features of the facial movement of the scene object (may include, but are not limited to, the movement distance of the eyebrows, the upward distance of the corners of the mouth, the downward distance of the corners of the mouth, etc. ) and the features of the scene object’s body movement (which may include, but is not limited to, the lifting range of the leg, the swinging range of the arm, and the bending angle of the body, etc.), and the initial sound feature may include, but is not limited to, the sound emitted by the scene object It has the characteristics of amplitude, frequency and so on.

在上述步骤S204提供的技术方案中，在从初始多媒体数据中提取到初始图像特征和初始声音特征的情况下，可以但不限于综合考虑初始图像特征和初始声音特征等等因素对场景对象所具有的情绪以及对场景对象所具有的情绪的剧烈程度的影响，实现了多维度识别场景对象所具有的情绪的剧烈程度，提升了目标情绪强度的准确性。In the technical solution provided in the above step S204, in the case of extracting the initial image features and initial sound features from the initial multimedia data, it is possible, but not limited to, to comprehensively consider factors such as the initial image features and initial sound features on the scene object. The emotional intensity of the scene object and the intensity of the emotion of the scene object have realized the multi-dimensional recognition of the intensity of the emotion of the scene object, and the accuracy of the target emotional intensity has been improved.

可选地，在本实施例中，场景对象所具有的情绪可以但不限于包括喜(可以但不限于包括喜爱、喜悦、喜好以及喜欢等等)；怒(可以但不限于包括愤怒、恼怒、发怒、怨恨以及愤恨等等)；哀(可以但不限于包括悲伤、悲痛、悲哀、怜悯、哀怜、哀愁、哀悯、哀怨、哀思等等)；乐(可以但不限于包括指欢乐，愉悦、高兴以及快乐等等)；惊(可以但不限于包括惊咤、惊愕、惊慌、惊悸、惊奇、惊叹、惊喜以及惊讶等等)；恐(可以但不限于包括恐慌、恐惧、害怕、担心、担忧、畏惧等等)；思(可以但不限于包括思念、想念以及思慕等等)等等。Optionally, in this embodiment, the emotion of the scene object may include but not limited to hi (can include but not limited to liking, joy, liking and liking, etc.); Anger, resentment and resentment, etc.); mourning (can be but not limited to include sadness, sorrow, sorrow, pity, pity, mourning, pity, mourning, mourning, etc.); joy (can be but not limited to include joy, joy, happy and joyful, etc.); frightened (can be but not limited to include shock, astonishment, panic, palpitations, surprise, surprise, surprise and surprise, etc.); , fear, etc.); think (can but not be limited to include miss, miss and longing, etc.) and so on.

可选地，在本实施例中，场景对象所具有的不同的剧烈程度的情绪，可以但不限于具有相同或者不同的情绪强度。在场景对象所具有的不同的剧烈程度的情绪具有相同的情绪强度的情况下，虽然场景对象所具有的情绪的剧烈程度不同，但对于场景对象来说，某些情绪可能需要剧烈到一定程度才具有记录的价值(比如：伤心的情绪需要到哇哇大哭才记录，而开心的情绪一出现就会记录等等)，在这样的情况下，通过将不同的剧烈程度的情绪分配至相同的情绪强度，可以满足用户筛选需要记录的情绪的需求。Optionally, in this embodiment, the scene objects may have the same or different emotional intensities, but are not limited to having the same or different emotional intensities. In the case that emotions with different degrees of severity have the same emotional strength, although the emotions of the scene objects have different degrees of severity, for the scene objects, some emotions may need to be severe to a certain extent. Has the value of recording (for example: sad emotions need to be recorded until they cry, and happy emotions will be recorded as soon as they appear, etc.), in this case, by assigning different degrees of intensity to the same emotion Intensity, which can meet the needs of users to filter the emotions that need to be recorded.

在一个示范性实施例中，可以但不限于通过以下方式征识别场景对象的目标情绪强度：对所述初始图像特征所包括的初始表情特征和初始动作特征，以及所述初始声音特征进行特征融合，得到初始融合特征，其中，所述初始表情特征用于通过所述场景对象的面部活动表征所述场景对象所具有的情绪，所述初始动作特征用于通过所述场景对象的肢体活动表征所述场景对象所具有的情绪，所述初始融合特征中包括与所述场景对象所具有的情绪之间相关度大于目标阈值的特征；确定与所述初始融合特征匹配的所述场景对象的所述目标情绪强度。In an exemplary embodiment, the target emotional intensity of the scene object can be characterized but not limited to in the following way: performing feature fusion on the initial expression features and initial action features included in the initial image features, as well as the initial sound features , to obtain the initial fusion feature, wherein the initial expression feature is used to represent the emotion of the scene object through the facial activity of the scene object, and the initial action feature is used to represent the emotion of the scene object through the body activity of the scene object The emotions of the scene objects, the initial fusion features include features whose correlation with the emotions of the scene objects is greater than the target threshold; determine the features of the scene objects that match the initial fusion features Target emotional intensity.

可选地，在本实施例中，在对初始表情特征和初始动作特征，以及初始声音特征进行特征融合的情况下，可以但不限于先从初始表情特征和初始动作特征，以及初始声音特征中筛选出有效的特征(比如：初始表情特征中场景对象的面部上眉毛的运动距离、嘴角的上扬距离以及嘴角的下撇距离等等是有效的特征，而场景对象的面部上眉毛的颜色、嘴唇的唇色等等是无效的特征；初始动作特征中腿的运动幅度、手臂的挥舞幅度以及身体的弯曲角度等等是有效的特征，而四肢的肤色、体型的胖瘦等等是无效的特征；初始声音特征中场景对象所发出的声音所具有的振幅、频率等等特征是有效的特征，而场景对象的口音等等是无效的特征)。再将筛选出来的有效的表情特征和动作特征，以及声音特征进行特征融合，使得初始融合特征能够准确地表征场景对象所具有的情绪的剧烈程度，提升了初始融合特征的准确度。Optionally, in this embodiment, in the case of performing feature fusion on the initial expression features, initial action features, and initial sound features, it is possible, but not limited, to start with the initial expression features, initial action features, and initial sound features. Filter out effective features (such as: the moving distance of the upper eyebrows on the face of the scene object in the initial expression features, the upward distance of the corners of the mouth and the downward distance of the corners of the mouth, etc. are effective features, and the color of the eyebrows on the face of the scene object, lips The lip color and so on are invalid features; in the initial action features, the range of motion of the legs, the swing range of the arms, and the bending angle of the body are valid features, while the skin color of the limbs, the fatness of the body, etc. are invalid features. ; In the initial sound feature, the amplitude, frequency and other features of the sound emitted by the scene object are effective features, while the accent of the scene object, etc. are invalid features). Then, the selected effective expression features, action features, and voice features are fused, so that the initial fusion features can accurately represent the intensity of the emotion of the scene object, and the accuracy of the initial fusion features is improved.

可选地，在本实施例中，初始表情特征可以但不限于用于通过场景对象的面部上的眉毛、眼睛、嘴、鼻子以及肌肉等等的运动表征场景对象所具有的情绪，初始动作特征可以但不限于通过场景对象的四肢(可以但不限于包括人的双上肢和双下肢，上肢可以但不限于包括肩、臂、肘、前臂、腕和手部等等，下肢可以但不限于包括臀部、股部、膝部、胫部和足部等等)的活动表征场景对象多具有的情绪。通过融合初始表情特征、初始动作特征，以及初始声音特征中有效的特征，提升了目标情绪强度的准确度。Optionally, in this embodiment, the initial expression features can be used, but not limited to, to represent the emotions of the scene object through the movements of the eyebrows, eyes, mouth, nose, and muscles on the face of the scene object, and the initial action features It can be but not limited to the limbs of the scene object (it can be but not limited to include human upper limbs and double lower limbs, the upper limbs can be but not limited to include shoulders, arms, elbows, forearms, wrists and hands, etc., the lower limbs can be but not limited to include Buttocks, thighs, knees, shins, and feet, etc.) represent the emotions that scene objects generally have. By fusing effective features from initial expression features, initial action features, and initial voice features, the accuracy of target emotional intensity is improved.

在一个示范性实施例中，可以但不限于通过以下方式确定目标情绪强度：确定与所述初始融合特征匹配的所述场景对象的目标表现剧烈度，其中，所述目标表现剧烈度用于指示所述场景对象外在表现的剧烈程度；将所述目标表现剧烈度转换为所述目标情绪强度。In an exemplary embodiment, the target emotion intensity may be determined in the following manner but not limited to: determining the target performance intensity of the scene object matching the initial fusion feature, wherein the target expression intensity is used to indicate The intensity of the external performance of the scene object; converting the intensity of the target performance into the intensity of the target emotion.

可选地，在本实施例中，可以但不限于根据初始融合特征确定场景对象外在表现的剧烈程度，再将目标表现剧烈度转换为场景对象内在的情绪所具有的情绪的剧烈程度，实现了根据场景对象实际的外在表现，确定场景对象所具有的情绪的剧烈程度，提升了目标情绪强度的准确度。Optionally, in this embodiment, it is possible, but not limited to, to determine the severity of the external performance of the scene object according to the initial fusion features, and then convert the target performance severity into the emotional severity of the scene object's inner emotions to achieve In order to determine the intensity of the emotion of the scene object according to the actual external performance of the scene object, the accuracy of the target emotion intensity is improved.

可选地，在本实施例中，场景对象的面部活动的剧烈程度可以但不限于根据场景对象的面部上的眉毛、眼睛、嘴、鼻子以及肌肉等等的运动幅度，运动距离、运动方向等等确定。场景对象的肢体活动的剧烈程度可以但不限于根据场景对象的四肢的运动幅度，运动距离、运动方向等等进行确定。场景对象所发出的声音的剧烈程度可以但不限于根据场景对象所发出的声音的响度、振幅、音调等等进行确定。可以但不限于综合场景对象的面部活动、肢体活动以及场景对象所发出的声音的剧烈程度确定目标表现剧烈度，提升了目标表现剧烈度的准确度。Optionally, in this embodiment, the intensity of facial activities of the scene object can be, but not limited to, based on the range of motion, distance, direction, etc. Wait for confirmation. The intensity of the limb activity of the scene object may be determined according to, but not limited to, the range of motion, the distance of motion, the direction of motion, and the like of the limbs of the scene object. The severity of the sound emitted by the scene object may be determined according to, but not limited to, the loudness, amplitude, pitch, etc. of the sound emitted by the scene object. The intensity of the target performance can be determined based on, but not limited to, the intensity of the facial and body movements of the scene object and the sound of the scene object, which improves the accuracy of the intensity of the target performance.

在一个示范性实施例中，可以但不限于通过以下方式确定目标情绪强度：将所述初始融合特征输入目标情绪识别模型，其中，所述目标情绪识别模型是使用标注了情绪强度标签的对象融合特征对初始情绪识别模型进行训练得到的；获取所述目标情绪识别模型输出的所述目标情绪强度。In an exemplary embodiment, the target emotion intensity can be determined in the following manner, but not limited to: input the initial fusion features into the target emotion recognition model, wherein the target emotion recognition model uses object fusion marked with emotion intensity labels Features obtained by training an initial emotion recognition model; obtaining the target emotion intensity output by the target emotion recognition model.

可选地，在本实施例中，情绪强度标签可以但不限于包括开心的程度(比如：超级开心，十分开心、一般开心等等)，悲伤的程度(比如：特别悲伤，有点悲伤、不悲伤等等)，以及惊讶的程度(比如很惊讶、有点惊讶、完全不惊讶等等)等等。Optionally, in this embodiment, the emotional intensity tag can include, but is not limited to, the degree of happiness (such as: super happy, very happy, generally happy, etc.), the degree of sadness (such as: very sad, a little sad, not sad etc.), and the degree of surprise (such as very surprised, a little surprised, not surprised at all, etc.) and so on.

可选地，在本实施例中，可以但不限于将初始融合特征输入目标情绪识别模型，获取目标情绪识别模型输出的结果作为目标情绪强度。实现了快速获取初始融合特征所对应的目标情绪强度，提升了获取目标情绪强度的效率。Optionally, in this embodiment, but not limited to, the initial fusion feature may be input into the target emotion recognition model, and the result output by the target emotion recognition model may be obtained as the target emotion intensity. It realizes the rapid acquisition of the target emotional intensity corresponding to the initial fusion feature, and improves the efficiency of acquiring the target emotional intensity.

在上述步骤S206提供的技术方案中，在目标情绪强度大于或者等于目标强度阈值的情况下，可以表明场景对象所具有的情绪已经足够剧烈，在这样的情况下，可以从初始多媒体数据中截取识别出目标情绪强度大于或者等于目标强度阈值所对应的时刻所在目标时间段内的多媒体数据作为目标多媒体数据，实现了在综合场景对象的面部活动、肢体活动以及场景对象所发出的声音识别出场景对象所具有的情绪的已经足够剧烈的情况下，可以及时自动地截取场景对象的精彩时刻的目标多媒体数据，避免了因为没有及时拍摄场景对象而导致的错过精彩时刻，提升了记录多媒体数据的效率。In the technical solution provided by the above step S206, when the target emotional intensity is greater than or equal to the target intensity threshold, it can be indicated that the scene object’s emotion is strong enough, and in this case, the identification can be intercepted from the initial multimedia data The multimedia data within the target time period corresponding to the time when the target emotional intensity is greater than or equal to the target intensity threshold is taken as the target multimedia data, realizing the recognition of the scene object in the comprehensive scene object's facial activities, body activities and the sound emitted by the scene object When the emotions are strong enough, the target multimedia data of the wonderful moment of the scene object can be automatically intercepted in time, avoiding missing the wonderful moment caused by not shooting the scene object in time, and improving the efficiency of recording multimedia data.

可选地，在本实施例中，在目标情绪强度小于目标强度阈值的情况下，可以表明场景对象所具有的情绪还不够剧烈，在这样的情况下，可以但不限于不对初始多媒体数据进行编辑，而是持续采集初始多媒体数据，并实时提取初始多媒体数据中初始图像特征和初始声音特征，并根据初始图像特征和初始声音特征识别场景对象的目标情绪强度，直至在场景对象的目标情绪强度大于或者等于目标强度阈值的情况下，再从初始多媒体数据中截取目标多媒体数据。Optionally, in this embodiment, when the target emotional intensity is less than the target intensity threshold, it may indicate that the emotion of the scene object is not strong enough. In this case, the initial multimedia data may not be edited , but continue to collect the initial multimedia data, and extract the initial image features and initial sound features in the initial multimedia data in real time, and identify the target emotional intensity of the scene object according to the initial image feature and initial sound feature, until the target emotional intensity of the scene object is greater than Or if it is equal to the target strength threshold, then intercept the target multimedia data from the initial multimedia data.

可选地，在本实施例中，在目标情绪强度大于或者等于目标强度阈值的情况下，可以表明场景对象所具有的情绪已经足够剧烈(可以但不限于包括场景对象在尖叫、拍打、大笑、哭闹、打闹，或者有捂肚子，人仰马翻，手舞足蹈，欢蹦乱跳等等人体剧烈运动，或者惊讶、悲伤、恐惧、愤怒、快乐、厌恶和轻蔑表情等等)，在这样的情况下，可以在场景对象最自然的状态下，自动记录下场景对象最真实的多媒体数据。避免了记录所有的多媒体数据极大地节省了了存储空间，并且省去了后续的挑选精彩的多媒体数据的过程，大大提升了记录多媒体数据的效率。Optionally, in this embodiment, when the target emotional intensity is greater than or equal to the target intensity threshold, it can be indicated that the emotion of the scene object is strong enough (which may include, but is not limited to, that the scene object is screaming, beating, Laughing, crying, fighting, or covering the stomach, turning on one's back, dancing, jumping and other strenuous movements of the human body, or expressions of surprise, sadness, fear, anger, happiness, disgust and contempt, etc.), in such cases, you can In the most natural state of the scene object, the most realistic multimedia data of the scene object is automatically recorded. Avoiding the recording of all multimedia data greatly saves storage space, and saves the subsequent process of selecting wonderful multimedia data, greatly improving the efficiency of recording multimedia data.

在一个示范性实施例中，从初始多媒体数据中截取目标时刻所在目标时间段内的目标多媒体数据，可以但不限于包括以下情形之一：In an exemplary embodiment, intercepting the target multimedia data within the target time period at the target moment from the initial multimedia data may include, but is not limited to, one of the following situations:

情形一，在所述初始多媒体数据中从第一时刻截取到所述目标时刻，得到所述目标多媒体数据，其中，所述第一时刻是所述目标时刻之前距离所述目标时刻为所述目标时间段的时刻。Situation 1, the target multimedia data is obtained by intercepting from the first moment in the initial multimedia data to the target moment, wherein the first moment is before the target moment and the distance from the target moment is the target moment moment of time.

可选地，在本实施例中，可以但不限于从初始多媒体数据中截取目标时刻之前的第一时刻至目标时刻之间的多媒体数据作为目标多媒体数据，在这样的情况下，场景对象所具有的情绪的剧烈程度可以但不限于在目标时刻达到了目标强度阈值，这样的方式可以截取包括场景对象所具有的情绪的剧烈程度达到目标强度阈值的过程的多媒体数据。Optionally, in this embodiment, multimedia data between the first moment before the target moment and the target moment may be intercepted from the initial multimedia data, but not limited to, as the target multimedia data. In this case, the scene object has The intensity of the emotion may be, but not limited to, reaching the target intensity threshold at the target moment. In this way, the multimedia data including the process in which the intensity of the emotion of the scene object reaches the target intensity threshold can be intercepted.

情形二，在所述初始多媒体数据中从所述目标时刻截取到第二时刻，得到所述目标多媒体数据，其中，所述第二时刻是所述目标时刻之后距离所述目标时刻为所述目标时间段的时刻。Scenario 2: The target multimedia data is obtained by intercepting from the target time to a second time in the initial multimedia data, wherein the second time is after the target time and the distance from the target time is the target time moment of time.

可选地，在本实施例中，可以但不限于从初始多媒体数据中截取目标时刻至目标时刻之后的第二时刻之间的多媒体数据作为目标多媒体数据，在这样的情况下，场景对象所具有的情绪的剧烈程度可以但不限于在目标时刻达到了目标强度阈值，可能场景对象所具有的情绪的剧烈程度在目标时刻之后可能会存在继续增加或者减少或者不变的情况，这样的方式可以截取在场景对象所具有的情绪的剧烈程度达到目标强度阈值之后，场景对象所具有的情绪的剧烈程度变化的过程的多媒体数据。Optionally, in this embodiment, multimedia data between the target moment and the second moment after the target moment may be intercepted from the initial multimedia data, but not limited to, as the target multimedia data. In this case, the scene object has The intensity of the emotion can be but not limited to reaching the target intensity threshold at the target moment, and the intensity of the emotion of the scene object may continue to increase or decrease or remain unchanged after the target moment. This method can be intercepted The multimedia data of the process of changing the intensity of the emotion of the scene object after the intensity of the emotion of the scene object reaches the target intensity threshold.

情形三，在所述初始多媒体数据中从第三时刻截取到第四时刻，得到所述目标多媒体数据，其中，所述第三时刻距离所述第四时刻为所述目标时间段，所述目标时刻在所述第三时刻和所述第四时刻之间。Situation 3, the target multimedia data is obtained by intercepting from the third moment to the fourth moment in the initial multimedia data, wherein the distance between the third moment and the fourth moment is the target time period, and the target A time is between said third time and said fourth time.

可选地，在本实施例中，在场景对象所具有的情绪的剧烈程度在目标时刻达到了目标强度阈值的情况下，可以但不限于从初始多媒体数据中截取目标时刻之前的第三时刻至目标时刻之后的第四时刻之间的多媒体数据作为目标多媒体数据，这样的截取方式可以完整记录场景对象所具有的情绪的剧烈程度达到目标强度阈值之前以及场景对象所具有的情绪的剧烈程度到达目标强度阈值之后的过程。Optionally, in this embodiment, when the intensity of the emotion of the scene object reaches the target intensity threshold at the target moment, the initial multimedia data may be intercepted from the third moment before the target moment to The multimedia data between the fourth moment after the target moment is used as the target multimedia data. This interception method can completely record the emotional intensity of the scene object before reaching the target intensity threshold and the emotional intensity of the scene object reaching the target. The process after the intensity threshold.

在一个示范性实施例中，可以但不限于通过以下方式存储目标多媒体数据：确定所述场景对象所对应的目标帐号，以及所述目标多媒体数据所对应的目标情绪类型，并从所述目标多媒体数据中获取所述目标情绪强度最高的目标数据帧；将以所述目标数据帧为展示封面的所述目标多媒体数据存储至所述目标帐号下所述目标情绪类型所对应的数据集合中。In an exemplary embodiment, the target multimedia data may be stored in the following manner but not limited to: determining the target account corresponding to the scene object and the target emotion type corresponding to the target multimedia data, and Obtain the target data frame with the highest intensity of the target emotion in the data; store the target multimedia data with the target data frame as the display cover in the data set corresponding to the target emotion type under the target account.

可选地，在本实施例中，每个场景对象可以但不限于都有对应的帐号，每个场景对象的帐号下可以但不限于包括了多种情绪类型所对应的数据集合。在确定场景对象所对应的目标帐号以及目标多媒体数据的目标情绪类型的情况下，如果目标情绪类型所对应的数据集合已经存在，可以但不限于直接将目标多媒体数据存储在目标帐号下目标情绪类型所对应的数据集合中。如果是首次存储目标情绪类型的目标多媒体数据，在这样的情况下，可以但不限于先创建目标情绪类型所对应的数据集合，再将目标多媒体数据存储至目标情绪类型所对应的数据集合中。Optionally, in this embodiment, each scene object may have, but is not limited to, a corresponding account, and the account of each scene object may include, but is not limited to, data sets corresponding to various emotion types. In the case of determining the target account corresponding to the scene object and the target emotion type of the target multimedia data, if the data set corresponding to the target emotion type already exists, it is possible but not limited to directly store the target multimedia data in the target account under the target emotion type in the corresponding data set. If it is the first time to store the target multimedia data of the target emotion type, in this case, but not limited to, the data set corresponding to the target emotion type may be created first, and then the target multimedia data is stored in the data set corresponding to the target emotion type.

图3是根据本申请实施例的一种存储目标多媒体数据的示意图，如图3所示，场景对象所对应的目标帐号下可以但不限于包括了情绪类型1，情绪类型2，……，情绪类型m，……，情绪类型n的数据集合1，数据集合2，……，数据集合m，……，数据集合n，目标多媒体数据所对应的目标情绪类型可以但不限于为情绪类型m，可以但不限于将目标多媒体数据存储至情绪类型m所对应的数据集合m中。此外，还可以对多媒体数据进行加密存储，并且可以按照用户所期望的存储方式(可以但不限于包括盘满覆盖或者盘满停止等等方式)对多媒体数据进行存储。Fig. 3 is a schematic diagram of storing target multimedia data according to an embodiment of the present application. As shown in Fig. 3, the target account corresponding to the scene object may include, but is not limited to, emotion type 1, emotion type 2, ..., emotion Type m, ..., data set 1 of emotion type n, data set 2, ..., data set m, ..., data set n, the target emotion type corresponding to the target multimedia data can be but not limited to emotion type m, It is possible but not limited to store the target multimedia data in the data set m corresponding to the emotion type m. In addition, the multimedia data can also be encrypted and stored, and the multimedia data can be stored according to the storage method desired by the user (including but not limited to overwriting when the disk is full or stopping when the disk is full, etc.).

可选地，在本实施例中，从目标多媒体数据中获取的目标情绪强度最高的目标数据帧可以表征场景对象所具有的情绪的剧烈程度达到峰值，在这样的情况下，可以表明目标数据帧是目标多媒体数据中最精彩的一帧，通过将每个场景对象的多媒体数据中最精彩的一帧作为多媒体数据封面的方式，可以方便用户直接查看该多媒体数据中最为精彩的时刻，提升用户的查阅体验。此外，还可以将每个场景对象的封面导出来，制作精彩瞬间的照片墙、表情包以及家庭影像合集等等，为用户提供了智能有趣的体验。Optionally, in this embodiment, the target data frame with the highest target emotional intensity obtained from the target multimedia data may represent that the intensity of the scene object’s emotion has reached its peak. In this case, it may indicate that the target data frame It is the most exciting frame in the target multimedia data. By using the most exciting frame in the multimedia data of each scene object as the cover of the multimedia data, it is convenient for the user to directly view the most exciting moment in the multimedia data and improve the user's Check out the experience. In addition, the cover of each scene object can also be exported to create a photo wall of wonderful moments, emoticons, family image collections, etc., providing users with a smart and interesting experience.

在一个示范性实施例中，可以但不限于通过以下方式调用多媒体数据：接收数据调用请求，其中，所述数据调用请求用于请求调用展示了符合目标关键词的目标场景对象的多媒体数据；响应所述数据调用请求，将所述目标关键词划分为场景属性和情绪属性；从为所述目标场景对象存储的多媒体数据集合中筛选符合所述场景属性的候选多媒体数据；从所述候选多媒体数据中筛选与所述情绪属性匹配的多媒体数据作为待调用多媒体数据；展示所述待调用多媒体数据。In an exemplary embodiment, the multimedia data can be invoked in the following manner, but not limited to: receiving a data invocation request, wherein the data invocation request is used to request invocation of the multimedia data showing the target scene object that meets the target keyword; responding The data call request divides the target keywords into scene attributes and emotional attributes; screens candidate multimedia data that conforms to the scene attributes from the multimedia data collection stored for the target scene object; selects from the candidate multimedia data Screening the multimedia data matching the emotional attribute as the multimedia data to be called; displaying the multimedia data to be called.

可选地，在本实施例中，在调用多媒体数据之前，目标场景中的场景对象可以但不限于先进行人脸注册，图4是根据本申请实施例的一种注册场景对象的人脸的示意图，如图4所示，家庭场景(即上述的目标场景)中的家庭成员(即上述的场景对象)可以但不限于在具有摄像头的智能设备前录入自身的年龄、姓名等等，具有摄像头的智能设备会对场景对象进行图像采集，进而录入家庭成员的人脸信息，构建家庭人脸信息数据库。Optionally, in this embodiment, before invoking the multimedia data, the scene object in the target scene may, but not limited to, perform face registration first. FIG. 4 is a face registration of a scene object according to an embodiment of the present application Schematic diagram, as shown in Figure 4, family members (ie, the above-mentioned scene objects) in the family scene (ie, the above-mentioned target scene) can, but not limited to, enter their own age, name, etc. in front of a smart device with a camera. Smart devices will collect images of scene objects, and then enter the face information of family members to build a family face information database.

可选地，在本实施例中，在需要编辑多媒体数据的情况下，可以但不限于先进行人脸识别，在识别出该人脸是家庭人脸信息数据库中记录的人脸的情况下，才可以对储存的多媒体数据进行查看、选取、导出等等操作，图5是根据本申请实施例的一种调用多媒体数据的示意图，如图5所示，可以但不限于通过部署在家庭场景(即上述的目标场景)中的具有摄像功能和语音功能的智能设备进行图像采集和声音采集(即上述的采集初始多媒体数据)，可以但不限于结合家庭人脸信息数据库中记录的注册的家庭成员录入的家庭成员人脸信息，进行人脸识别；并进行动作识别和声音识别；并将识别出的家庭成员的精彩时刻的影像资料(即上述的目标多媒体数据)进行存储，在需要对存储的影像资料进行正常观看、注册或者删除的情况下，需要先进行用户验证登录，通过用户验证的帐号才能对存储的影像资料进行正常观看、注册或者删除等等编辑操作。通过这样的方式可以保证多媒体数据的安全，避免任何人都可以查阅用户的多媒体数据，进而导致的用户多媒体数据的泄漏。并且在存储多媒体数据集合的智能设备出现故障或者需要出售或者丢弃的情况下，可以通过注销此智能设备与用户的帐号之间的绑定官谢，或者可以将所有影像资料进行清空，保证家庭隐私安全。Optionally, in this embodiment, in the case where multimedia data needs to be edited, face recognition may be performed first, but not limited to, if the face is identified as a face recorded in the family face information database, Only then can the stored multimedia data be viewed, selected, exported, etc., and FIG. 5 is a schematic diagram of calling multimedia data according to an embodiment of the present application. As shown in FIG. That is, the smart device with camera function and voice function in the above-mentioned target scene) performs image collection and sound collection (that is, the above-mentioned collection of initial multimedia data), which can be, but not limited to, combined with registered family members recorded in the family face information database The face information of the family member of entry, carry out face recognition; And carry out motion recognition and voice recognition; In the case of normal viewing, registration, or deletion of image data, user authentication and login are required first, and the account that has passed user authentication can perform normal viewing, registration, or deletion of stored image data, etc. Editing operations. In this way, the security of the multimedia data can be ensured, and the leakage of the user's multimedia data can be prevented from being viewed by anyone. And when the smart device storing the multimedia data collection fails or needs to be sold or discarded, the binding between the smart device and the user's account can be cancelled, or all image data can be cleared to ensure family privacy Safety.

可选地，在本实施例中，用户可以但不限于在可视化界面上编辑想要查看的多媒体数据的目标关键词，可以但不限于将目标关键词划分为截取多媒体数据的时间、地点(即上述的目标场景)、成员(即上述的场景对象)、和人数(即上述的场景对象的数量)等等场景属性，以及表情、动作、和声音等等情绪属性。Optionally, in this embodiment, the user can, but is not limited to, edit the target keyword of the multimedia data that he wants to view on the visual interface, and can, but is not limited to, divide the target keyword into the time and place of intercepting the multimedia data (i.e. The above-mentioned target scene), members (that is, the above-mentioned scene object), and the number of people (that is, the number of the above-mentioned scene object) and other scene attributes, and emotional attributes such as expressions, actions, and voices.

可选地，在本实施例中，可以但不限于先从为目标场景对象存储的多媒体数据集合中筛选符合场景属性的候选多媒体数据；再从候选多媒体数据中筛选与情绪属性匹配的多媒体数据作为待调用多媒体数据。或者先从为目标场景对象存储的多媒体数据集合中筛选符合情绪属性的多媒体数据；再从符合情绪属性的多媒体数据中筛选与场景属性匹配的多媒体数据作为待调用多媒体数据。Optionally, in this embodiment, it is possible, but not limited to, first to filter candidate multimedia data conforming to scene attributes from the multimedia data set stored for the target scene object; and then to filter multimedia data matching emotional attributes from candidate multimedia data as Multimedia data to be called. Alternatively, firstly select the multimedia data matching the emotional attribute from the multimedia data set stored for the target scene object; and then select the multimedia data matching the scene attribute from the multimedia data matching the emotional attribute as the multimedia data to be called.

为了更好的理解上述多媒体数据的编辑的过程，以下再结合可选实施例对上述多媒体数据的编辑流程进行说明，但不用于限定本申请实施例的技术方案。In order to better understand the editing process of the above multimedia data, the editing flow of the above multimedia data will be described below in conjunction with an optional embodiment, but it is not used to limit the technical solution of the embodiment of the present application.

在本实施例中提供了一种多媒体数据的编辑方法，图6是根据本申请实施例的一种多媒体数据的编辑方法的示意图，如图6所示，可以但不限于通过部署在家庭场景(即上述的目标场景)中的智能空调、智能电视、和智能音箱等等智能设备实时采集场景对象的初始多媒体数据，将采集到的初始多媒体数据输入人脸特征识别模型、动作特征识别模型以及语音特征识别模型，识别出初始多媒体数据中的初始表情特征、初始动作特征以及初始声音特征；可以但不限于将初始表情特征、初始活动特征以及初始声音特征输入特征融合模型进行特征融合，得到初始融合特征；可以但不限于将初始融合特征输入情绪识别模型，识别初始融合特征所对应的情绪，得到场景对象所具有的情绪；可以但不限于确定场景对象所具有的情绪的目标情绪强度；在目标情绪强度大于或者等于目标强度阈值的情况下，从初始多媒体数据中截取识别出目标情绪强度大于或者等于目标强度阈值的多媒体数据所对应的时刻的目标多媒体数据；在目标情绪强度小于目标强度阈值的情况下，持续采集初始多媒体数据，直至在确定目标情绪强度大于或者等于目标强度阈值的情况下，从初始多媒体数据中截取识别出目标情绪强度大于或者等于目标强度阈值的多媒体数据所对应的时刻的目标多媒体数据。In this embodiment, a method for editing multimedia data is provided. FIG. 6 is a schematic diagram of a method for editing multimedia data according to an embodiment of the present application. As shown in FIG. 6 , it can be deployed in a home scene ( That is, smart devices such as smart air conditioners, smart TVs, and smart speakers in the above-mentioned target scene) collect initial multimedia data of scene objects in real time, and input the collected initial multimedia data into the face feature recognition model, action feature recognition model and voice The feature recognition model identifies the initial expression features, initial action features, and initial voice features in the initial multimedia data; it can, but is not limited to, input the initial expression features, initial activity features, and initial voice features into the feature fusion model for feature fusion to obtain the initial fusion Features; can be but not limited to input the initial fusion feature into the emotion recognition model, identify the emotion corresponding to the initial fusion feature, and obtain the emotion of the scene object; can but not be limited to determine the target emotion intensity of the emotion of the scene object; in the target When the emotional intensity is greater than or equal to the target intensity threshold, intercept and identify the target multimedia data at the moment corresponding to the multimedia data whose target emotional intensity is greater than or equal to the target intensity threshold from the initial multimedia data; when the target emotional intensity is less than the target intensity threshold In this case, the initial multimedia data is continuously collected until it is determined that the target emotional intensity is greater than or equal to the target intensity threshold, and the time corresponding to the multimedia data identifying that the target emotional intensity is greater than or equal to the target intensity threshold is intercepted from the initial multimedia data. Target multimedia data.

需要说明的是，在图6中仅仅以通过智能空调、智能电视、和智能音箱进行初始多媒体数据的采集，实际上可以但不限于通过手机、电视、家中部署的摄像头、平板，智慧屏、智能音箱、智能电视、智能音响、智能空调等等具备摄像功能和语音功能的智能设备进行初始多媒体数据的采集。本申请对此不作限制。It should be noted that in Figure 6, the initial multimedia data collection is only performed through smart air conditioners, smart TVs, and smart speakers. Speakers, smart TVs, smart speakers, smart air conditioners and other smart devices with camera and voice functions are used to collect initial multimedia data. This application is not limited to this.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，或者网络设备等)执行本申请各个实施例的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to make a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) execute the method of each embodiment of the present application.

图7是根据本申请实施例的一种多媒体数据的编辑装置的结构框图，如图7所示，包括：Fig. 7 is a structural block diagram of an editing device for multimedia data according to an embodiment of the present application, as shown in Fig. 7 , including:

提取模块72，用于从对目标场景中的场景对象采集的初始多媒体数据中提取初始图像特征和初始声音特征，其中，所述初始图像特征用于通过所述场景对象上目标对象部位的运动表征所述场景对象所具有的情绪，所述初始声音特征用于通过所述场景对象所发出的声音表征所述场景对象所具有的情绪；The extraction module 72 is used to extract initial image features and initial sound features from the initial multimedia data collected from the scene objects in the target scene, wherein the initial image features are used to represent the movement of the target object parts on the scene objects The emotion that the scene object has, and the initial sound feature is used to represent the emotion that the scene object has through the sound emitted by the scene object;

识别模块74，用于根据所述初始图像特征和所述初始声音特征识别所述场景对象的目标情绪强度，其中，所述目标情绪强度用于指示所述场景对象所具有的情绪的剧烈程度；An identification module 74, configured to identify the target emotional intensity of the scene object according to the initial image feature and the initial sound feature, wherein the target emotional intensity is used to indicate the intensity of the emotion of the scene object;

截取模块76，用于在所述目标情绪强度大于或者等于目标强度阈值的情况下，从所述初始多媒体数据中截取目标时刻所在目标时间段内的目标多媒体数据，其中，所述目标时刻是识别出所述目标情绪强度大于或者等于所述目标强度阈值的多媒体数据所对应的时刻。The intercepting module 76 is used to intercept the target multimedia data within the target time period where the target moment is located from the initial multimedia data when the target emotional intensity is greater than or equal to the target intensity threshold, wherein the target moment is the identification The moment corresponding to the multimedia data whose target emotional intensity is greater than or equal to the target intensity threshold is displayed.

通过上述实施例，可以根据从初始多媒体数据中提取到的初始图像特征和初始声音特征，识别场景对象的目标情绪强度，在目标情绪强度大于或者等于目标强度阈值的情况下，可以表明在目标时刻出现了场景对象的精彩瞬间，在这样的情况下，可以从初始多媒体数据中截取目标时刻的目标时间段内的目标多媒体数据，实现了结合场景对象的图像特征和声音特征，多维度识别场景对象的情绪强度，并可以在识别到场景对象的精彩瞬间的情况下，自动记录该精彩时刻的多媒体数据。采用上述技术方案，解决了相关技术中，编辑精彩瞬间的多媒体数据的效率较低等问题，实现了提升编辑精彩瞬间的多媒体数据的效率的技术效果。Through the above-mentioned embodiments, the target emotional intensity of the scene object can be identified according to the initial image features and initial sound features extracted from the initial multimedia data, and when the target emotional intensity is greater than or equal to the target intensity threshold, it can be indicated that at the target moment There is a wonderful moment of the scene object. In this case, the target multimedia data within the target time period at the target time can be intercepted from the initial multimedia data, and the multi-dimensional recognition of the scene object can be achieved by combining the image features and sound features of the scene object. emotional intensity, and can automatically record the multimedia data of the wonderful moment in the case of recognizing the wonderful moment of the scene object. By adopting the above technical solution, the problem of low efficiency in editing the multimedia data of the wonderful moment in the related art is solved, and the technical effect of improving the efficiency of editing the multimedia data of the wonderful moment is realized.

可选的，所述识别模块，包括：Optionally, the identification module includes:

融合单元，用于对所述初始图像特征所包括的初始表情特征和初始动作特征，以及所述初始声音特征进行特征融合，得到初始融合特征，其中，所述初始表情特征用于通过所述场景对象的面部活动表征所述场景对象所具有的情绪，所述初始动作特征用于通过所述场景对象的肢体活动表征所述场景对象所具有的情绪，所述初始融合特征中包括与所述场景对象所具有的情绪之间相关度大于目标阈值的特征；A fusion unit, configured to perform feature fusion on the initial expression features and initial action features included in the initial image features, as well as the initial sound features, to obtain initial fusion features, wherein the initial expression features are used to pass through the scene The facial activity of the object represents the emotion of the scene object, and the initial action feature is used to represent the emotion of the scene object through the body activity of the scene object, and the initial fusion feature includes the The feature that the correlation degree between the emotions of the object is greater than the target threshold;

确定单元，用于确定与所述初始融合特征匹配的所述场景对象的所述目标情绪强度。A determining unit, configured to determine the target emotional intensity of the scene object that matches the initial fusion feature.

可选的，所述确定单元，用于：Optionally, the determining unit is used for:

可选的，所述截取模块，包括以下之一：Optionally, the interception module includes one of the following:

第一截取单元，用于在所述初始多媒体数据中从第一时刻截取到所述目标时刻，得到所述目标多媒体数据，其中，所述第一时刻是所述目标时刻之前距离所述目标时刻为所述目标时间段的时刻；A first intercepting unit, configured to intercept from the first moment to the target moment in the initial multimedia data to obtain the target multimedia data, wherein the first moment is a distance from the target moment before the target moment is the moment of the target time period;

第二截取单元，用于在所述初始多媒体数据中从所述目标时刻截取到第二时刻，得到所述目标多媒体数据，其中，所述第二时刻是所述目标时刻之后距离所述目标时刻为所述目标时间段的时刻；The second intercepting unit is configured to intercept from the target time to a second time in the initial multimedia data to obtain the target multimedia data, wherein the second time is a distance from the target time after the target time is the moment of the target time period;

第三截取单元，用于在所述初始多媒体数据中从第三时刻截取到第四时刻，得到所述目标多媒体数据，其中，所述第三时刻距离所述第四时刻为所述目标时间段，所述目标时刻在所述第三时刻和所述第四时刻之间。The third intercepting unit is configured to intercept from the third moment to the fourth moment in the initial multimedia data to obtain the target multimedia data, wherein the distance between the third moment and the fourth moment is the target time period , the target moment is between the third moment and the fourth moment.

可选的，所述装置还包括：Optionally, the device also includes:

确定模块，用于在所述从所述初始多媒体数据中截取目标时刻所在目标时间段内的目标多媒体数据之后，确定所述场景对象所对应的目标帐号，以及所述目标多媒体数据所对应的目标情绪类型，并从所述目标多媒体数据中获取所述目标情绪强度最高的目标数据帧；A determination module, configured to determine the target account corresponding to the scene object and the target account corresponding to the target multimedia data after the target multimedia data within the target time period at the target time is intercepted from the initial multimedia data. emotion type, and obtain the target data frame with the highest intensity of the target emotion from the target multimedia data;

存储模块，用于将以所述目标数据帧为展示封面的所述目标多媒体数据存储至所述目标帐号下所述目标情绪类型所对应的数据集合中。A storage module, configured to store the target multimedia data with the target data frame as the display cover in the data set corresponding to the target emotion type under the target account.

可选的，所述装置还包括：Optionally, the device also includes:

接收模块，用于在所述从所述初始多媒体数据中截取目标时刻所在目标时间段内的目标多媒体数据之后，接收数据调用请求，其中，所述数据调用请求用于请求调用展示了符合目标关键词的目标场景对象的多媒体数据；The receiving module is configured to receive a data invocation request after intercepting the target multimedia data in the target time period at the target moment from the initial multimedia data, wherein the data invocation request is used to request invoking to show that the key to meet the target The multimedia data of the target scene object of the word;

划分模块，用于响应所述数据调用请求，将所述目标关键词划分为场景属性和情绪属性；A division module, configured to divide the target keywords into scene attributes and emotional attributes in response to the data call request;

第一筛选模块，用于从为所述目标场景对象存储的多媒体数据集合中筛选符合所述场景属性的候选多媒体数据；A first screening module, configured to screen candidate multimedia data conforming to the scene attribute from the multimedia data set stored for the target scene object;

第二筛选模块，用于从所述候选多媒体数据中筛选与所述情绪属性匹配的多媒体数据作为待调用多媒体数据；The second screening module is used to screen the multimedia data matching the emotional attribute from the candidate multimedia data as the multimedia data to be invoked;

展示模块，用于展示所述待调用多媒体数据。A display module, configured to display the multimedia data to be called.

本申请的实施例还提供了一种存储介质，该存储介质包括存储的程序，其中，上述程序运行时执行上述任一项的方法。An embodiment of the present application further provides a storage medium, the storage medium includes a stored program, wherein the above-mentioned program executes any one of the above-mentioned methods when running.

可选地，在本实施例中，上述存储介质可以被设置为存储用于执行以下步骤的程序代码：Optionally, in this embodiment, the above-mentioned storage medium may be configured to store program codes for performing the following steps:

S1，从对目标场景中的场景对象采集的初始多媒体数据中提取初始图像特征和初始声音特征，其中，所述初始图像特征用于通过所述场景对象上目标对象部位的运动表征所述场景对象所具有的情绪，所述初始声音特征用于通过所述场景对象所发出的声音表征所述场景对象所具有的情绪；S1. Extracting initial image features and initial sound features from the initial multimedia data collected on the scene objects in the target scene, wherein the initial image features are used to characterize the scene objects through the motion of the target object parts on the scene objects The emotion possessed, the initial sound feature is used to represent the emotion possessed by the scene object through the sound emitted by the scene object;

S2，根据所述初始图像特征和所述初始声音特征识别所述场景对象的目标情绪强度，其中，所述目标情绪强度用于指示所述场景对象所具有的情绪的剧烈程度；S2. Identify the target emotional intensity of the scene object according to the initial image feature and the initial sound feature, wherein the target emotional intensity is used to indicate the intensity of the emotion of the scene object;

S3，在所述目标情绪强度大于或者等于目标强度阈值的情况下，从所述初始多媒体数据中截取目标时刻所在目标时间段内的目标多媒体数据，其中，所述目标时刻是识别出所述目标情绪强度大于或者等于所述目标强度阈值的多媒体数据所对应的时刻。S3. In the case that the target emotional intensity is greater than or equal to the target intensity threshold, intercept the target multimedia data within the target time period where the target moment is located from the initial multimedia data, wherein the target moment is when the target is identified The moment corresponding to the multimedia data whose emotional intensity is greater than or equal to the target intensity threshold.

本申请的实施例还提供了一种电子装置，包括存储器和处理器，该存储器中存储有计算机程序，该处理器被设置为运行计算机程序以执行上述任一项方法实施例中的步骤。An embodiment of the present application also provides an electronic device, including a memory and a processor, where a computer program is stored in the memory, and the processor is configured to run the computer program to perform the steps in any one of the above method embodiments.

可选地，上述电子装置还可以包括传输设备以及输入输出设备，其中，该传输设备和上述处理器连接，该输入输出设备和上述处理器连接。Optionally, the above-mentioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the above-mentioned processor, and the input-output device is connected to the above-mentioned processor.

可选地，在本实施例中，上述处理器可以被设置为通过计算机程序执行以下步骤：Optionally, in this embodiment, the above-mentioned processor may be configured to execute the following steps through a computer program:

可选地，在本实施例中，上述存储介质可以包括但不限于：U盘、只读存储器(Read-Only Memory，简称为ROM)、随机存取存储器(Random Access Memory，简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。Optionally, in this embodiment, the above-mentioned storage medium may include but not limited to: U disk, read-only memory (Read-Only Memory, ROM for short), random access memory (Random Access Memory, RAM for short), Various media that can store program codes such as removable hard disks, magnetic disks, or optical disks.

可选地，本实施例中的具体示例可以参考上述实施例及可选实施方式中所描述的示例，本实施例在此不再赘述。Optionally, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and optional implementation manners, and details are not repeated in this embodiment.

显然，本领域的技术人员应该明白，上述的本申请的各模块或各步骤可以用通用的计算装置来实现，它们可以集中在单个的计算装置上，或者分布在多个计算装置所组成的网络上，可选地，它们可以用计算装置可执行的程序代码来实现，从而，可以将它们存储在存储装置中由计算装置来执行，并且在某些情况下，可以以不同于此处的顺序执行所示出或描述的步骤，或者将它们分别制作成各个集成电路模块，或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样，本申请不限制于任何特定的硬件和软件结合。Obviously, those skilled in the art should understand that each module or each step of the above-mentioned application can be realized by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in a network composed of multiple computing devices Alternatively, they may be implemented in program code executable by a computing device so that they may be stored in a storage device to be executed by a computing device, and in some cases in an order different from that shown here The steps shown or described are carried out, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps among them are fabricated into a single integrated circuit module for implementation. As such, the present application is not limited to any specific combination of hardware and software.

以上所述仅是本申请的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本申请原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本申请的保护范围。The above description is only the preferred embodiment of the present application. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the present application, some improvements and modifications can also be made. These improvements and modifications are also It should be regarded as the protection scope of this application.

Claims

1. A method for editing multimedia data, characterized in that, comprising:

Extracting initial image features and initial sound features from the initial multimedia data collected on the scene object in the target scene, wherein the initial image features are used to characterize the scene object through the motion of the target object part on the scene object. emotion, the initial sound feature is used to represent the emotion of the scene object through the sound emitted by the scene object;

Identifying the target emotional intensity of the scene object according to the initial image feature and the initial sound feature, wherein the target emotional intensity is used to indicate the intensity of the emotion of the scene object;

In the case where the target emotional strength is greater than or equal to the target strength threshold, intercept the target multimedia data within the target time period where the target moment is located from the initial multimedia data, wherein the target moment is when the target emotional strength is identified The moment corresponding to the multimedia data greater than or equal to the target intensity threshold.

2. The method according to claim 1, wherein the identifying the target emotional intensity of the scene object according to the initial image feature and the initial sound feature comprises:

performing feature fusion on the initial expression features and initial action features included in the initial image features, and the initial sound features to obtain initial fusion features, wherein the initial expression features are used to represent the facial activity of the scene object The emotion possessed by the scene object, the initial action feature is used to characterize the emotion possessed by the scene object through the physical activity of the scene object, and the initial fusion feature includes the emotion possessed by the scene object The features whose correlation degree is greater than the target threshold;

Determining the target emotional intensity of the scene objects matching the initial fusion features.

3. The method according to claim 2, wherein the determination of the target emotional intensity of the scene object matched with the initial fusion feature comprises:

determining the target performance severity of the scene object matched with the initial fusion feature, wherein the target performance severity is used to indicate the severity of the external performance of the scene object;

The target expressive intensity is converted to the target emotional intensity.

4. The method according to claim 2, wherein the determination of the target emotional intensity of the scene object matched with the initial fusion feature comprises:

The initial fusion feature is input into the target emotion recognition model, wherein the target emotion recognition model is obtained by using the object fusion feature marked with the emotional strength label to train the initial emotion recognition model;

Acquiring the target emotion intensity output by the target emotion recognition model.

5. The method according to claim 1, wherein the intercepting the target multimedia data in the target time period at the target moment from the initial multimedia data includes one of the following:

Intercepting from the first moment to the target moment in the initial multimedia data to obtain the target multimedia data, wherein the first moment is the target time period before the target moment from the target moment time;

Intercepting from the target time to a second time in the initial multimedia data to obtain the target multimedia data, wherein the second time is the distance from the target time after the target time to the target time period time;

The target multimedia data is obtained by intercepting from the third moment to the fourth moment in the initial multimedia data, wherein the distance between the third moment and the fourth moment is the target time period, and the target moment is within the between the third moment and the fourth moment.

6. The method according to claim 1, wherein, after said intercepting the target multimedia data in the target time period where the target moment is located from the initial multimedia data, the method further comprises:

Determining the target account corresponding to the scene object and the target emotion type corresponding to the target multimedia data, and obtaining the target data frame with the highest intensity of the target emotion from the target multimedia data;

storing the target multimedia data with the target data frame as the display cover in the data set corresponding to the target emotion type under the target account.

7. The method according to claim 1, wherein, after said intercepting the target multimedia data in the target time period where the target moment is located from the initial multimedia data, the method further comprises:

Receiving a data call request, wherein the data call request is used to request to call multimedia data showing a target scene object conforming to the target keyword;

In response to the data call request, the target keywords are divided into scene attributes and emotional attributes;

Screening candidate multimedia data conforming to the scene attribute from the multimedia data set stored for the target scene object;

Screening multimedia data matching the emotional attribute from the candidate multimedia data as the multimedia data to be invoked;

Display the multimedia data to be called.

8. An editing device for multimedia data, characterized in that, comprising:

An extraction module, configured to extract initial image features and initial sound features from the initial multimedia data collected from the scene object in the target scene, wherein the initial image features are used to represent the motion of the target object part on the scene object. The emotion that the scene object has, the initial sound feature is used to represent the emotion that the scene object has through the sound sent by the scene object;

An identification module, configured to identify the target emotional intensity of the scene object according to the initial image feature and the initial sound feature, wherein the target emotional intensity is used to indicate the intensity of the emotion of the scene object;

An intercepting module, configured to intercept the target multimedia data within the target time period where the target moment is located from the initial multimedia data when the target emotional intensity is greater than or equal to the target intensity threshold, wherein the target moment is identified The moment corresponding to the multimedia data whose target emotional intensity is greater than or equal to the target intensity threshold.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the method according to any one of claims 1 to 7 is executed when the program runs.

10. An electronic device, comprising a memory and a processor, wherein a computer program is stored in the memory, and the processor is configured to execute any one of claims 1 to 7 through the computer program. Methods.