CN111667399B

CN111667399B - Method for training style transfer model, method and device for video style transfer

Info

Publication number: CN111667399B
Application number: CN202010409043.0A
Authority: CN
Inventors: 张依曼; 陈醒濠; 王云鹤; 许春景
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-05-14
Filing date: 2020-05-14
Publication date: 2023-08-25
Anticipated expiration: 2040-05-14
Also published as: CN111667399A

Abstract

This application discloses a training method for a style transfer model in the field of artificial intelligence, a method and a device for video style transfer, including: obtaining training data; performing image style transfer on N frames of sample content images according to the sample style images through a neural network model Processing to obtain N frames of predicted composite images; according to the image loss function between N frames of sample content images and N frames of predicted composite images, determine the parameters of the neural network model. The image loss function includes a low-rank loss function, and the low-rank loss function is used for Indicates the difference between the first low-rank matrix and the second low-rank matrix. The first low-rank matrix is obtained based on N frames of sample content images and optical flow information, and the second low-rank matrix is based on N frames of predicted composite images and optical flow information. The optical flow information is obtained from the flow information, and the optical flow information is used to represent the position difference of corresponding pixel points between two adjacent frames of images in the N frame sample content images. The technical solution of the present application can improve the stability of the video after style transfer processing.

Description

Method for training style transfer model, method and device for video style transfer

技术领域technical field

本申请涉人工智能领域，更具体地，涉及计算机视觉领域中的风格迁移模型的训练方法、视频风格迁移的方法以及装置。The present application relates to the field of artificial intelligence, and more specifically, to a training method for a style transfer model in the field of computer vision, a method and a device for video style transfer.

背景技术Background technique

人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说，人工智能是计算机科学的一个分支，它企图了解智能的实质，并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is the branch of computer science that attempts to understand the nature of intelligence and produce a new class of intelligent machines that respond in ways similar to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

图像风格迁移等图像渲染任务在终端设备上具有广泛的应用需求场景。随着终端设备的性能和网络性能的高速提升，终端设备的娱乐需求渐渐从图像级别转为了视频级别，即从对单张图像的图像风格迁移处理转为对视频的图像风格迁移处理；与图像风格迁移任务相比，视频风格迁移任务不仅要考虑图像的风格化效果，还要考虑视频包括的多帧图像之间的稳定性，从而确保经过图像风格迁移处理后的视频的流畅性。Image rendering tasks such as image style transfer have a wide range of application requirements on terminal devices. With the rapid improvement of the performance of terminal equipment and network performance, the entertainment needs of terminal equipment have gradually changed from image level to video level, that is, from the image style transfer processing of single images to the image style transfer processing of videos; Compared with the style transfer task, the video style transfer task not only considers the stylization effect of the image, but also considers the stability between the multiple frames included in the video, so as to ensure the fluency of the video after the image style transfer process.

因此，如何提高视频在图像迁移处理后的稳定性成为一个亟需解决的问题。Therefore, how to improve the stability of video after image migration processing has become an urgent problem to be solved.

发明内容Contents of the invention

本申请提供一种风格迁移模型的训练方法、视频风格迁移的方法及装置，通过在训练用于视频的风格迁移模型的过程中引入了低秩损失函数，能够同步风格迁移后的视频与原始视频的稳定性，从而能够提高目标迁移模型得到的风格迁移处理后视频的稳定性。This application provides a training method for a style transfer model, a method and a device for video style transfer, by introducing a low-rank loss function in the process of training a style transfer model for video, the video after style transfer can be synchronized with the original video The stability of the method can improve the stability of the video after style transfer processing obtained by the target transfer model.

第一方面，提供了一种风格迁移模型的训练方法，包括：获取训练数据，其中，所述训练数据包括N帧样本内容图像、样本风格图像以及N帧合成图像，所述N帧合成图像是根据所述样本风格图像对所述N帧样本内容图像进行图像风格迁移处理后得到的图像，N为大于或者等于2的整数；通过神经网络模型根据所述样本风格图像对所述N帧样本内容图像进行图像风格迁移处理，得到N帧预测合成图像；根据所述N帧样本内容图像与所述N帧预测合成图像之间的图像损失函数，确定所述神经网络模型的参数，In the first aspect, a method for training a style transfer model is provided, including: obtaining training data, wherein the training data includes N frames of sample content images, sample style images, and N frames of composite images, and the N frames of composite images are An image obtained by performing image style transfer processing on the N frames of sample content images according to the sample style image, where N is an integer greater than or equal to 2; using a neural network model to transfer the N frames of sample content according to the sample style image The image is subjected to image style transfer processing to obtain N frames of predicted composite images; according to the image loss function between the N frames of sample content images and the N frames of predicted composite images, the parameters of the neural network model are determined,

其中，所述图像损失函数包括低秩损失函数，所述低秩损失函数用于表示第一低秩矩阵与第二低秩矩阵之间的差异，所述第一低秩矩阵是基于所述N帧样本内容图像与光流信息得到的，所述第二低秩矩阵是基于所述N帧预测合成图像与所述光流信息得到的，所述光流信息用于表示所述N帧样本内容图像中相邻两帧图像之间对应像素点的位置差异。Wherein, the image loss function includes a low-rank loss function, and the low-rank loss function is used to represent the difference between the first low-rank matrix and the second low-rank matrix, and the first low-rank matrix is based on the N frame sample content image and optical flow information, the second low-rank matrix is obtained based on the N-frame predicted composite image and the optical flow information, and the optical flow information is used to represent the N-frame sample content The position difference of corresponding pixels between two adjacent frames of images in the image.

应理解，对于多帧图像构成的矩阵中，低秩矩阵可以用于表示N帧图像中都出现且不是运动边界的区域。稀疏矩阵可以用于表示N帧图像中间歇性出现的区域；例如，稀疏矩阵可以是指由于摄像机移动而在图像边界新出现或消失的区域，或者，移动物体的边界区域。It should be understood that, for a matrix composed of multiple frames of images, a low-rank matrix can be used to represent regions that appear in all N frames of images and are not motion boundaries. A sparse matrix can be used to represent regions that appear intermittently in N frames of images; for example, a sparse matrix can refer to regions that newly appear or disappear at image borders due to camera movement, or border regions of moving objects.

在本申请的实施例中，在训练用于视频风格迁移处理的目标风格迁移模型时引入低秩损失函数，通过引入低秩损失函数可以使得待处理视频中相邻多帧图像中都出现且不是运动边界的区域在经过风格迁移处理后仍保持相同，即使得风格迁移处理后的视频中该区域的秩逼近于待处理视频该区域的秩，从而能够提高风格迁移处理后视频的稳定性。In the embodiment of the present application, when training the target style transfer model for video style transfer processing, a low-rank loss function is introduced. By introducing a low-rank loss function, it is possible to make the images that appear in adjacent multiple frames of the video to be processed and are not The region of the motion boundary remains the same after the style transfer process, that is, the rank of the region in the style transfer processed video is close to the rank of the region in the video to be processed, thereby improving the stability of the style transfer processed video.

应理解，图像风格迁移处理是指将一幅具有风格迁移需要的图像即内容图像A中的图像内容与一幅风格图像B的图像风格进行融合的处理，从而生成一张具有图像A的内容和图像B的风格的合成图像C，或者成为融合图像C。It should be understood that the image style transfer process refers to the process of fusing an image that requires style transfer, that is, the image content in the content image A, with the image style of a style image B, so as to generate an image with the content of image A and A composite image C in the style of image B, or a fused image C.

其中，风格图像可以是指进行风格迁移处理的参考图像，图像中的风格可以包括图像的纹理特征、图像的艺术表现形式；比如，著名画作的风格，图像的艺术表现形式可以包括卡通、漫画、油画、水彩、水墨等图像风格；内容图像可以是指需要进行风格迁移的图像，图像中的内容可以是指图像中的语义信息，即可以包括内容图像中的高频信息、低频信息等。Wherein, the style image may refer to a reference image for style transfer processing, and the style in the image may include the texture features of the image and the artistic expression of the image; for example, the style of a famous painting, the artistic expression of the image may include cartoons, comics, Image styles such as oil painting, watercolor, and ink; the content image can refer to the image that needs style transfer, and the content in the image can refer to the semantic information in the image, that is, it can include high-frequency information and low-frequency information in the content image.

在一种可能的实现方式中，第一低秩矩阵是基于N帧样本内容图像与光流信息得到的；比如，第一低秩矩阵可以是指通过计算N帧样本内容图像中相邻图像帧之间的光流信息；根据光流信息可以得到掩码信息，其中，光流信息用于表示相邻帧图像对应的像素点的运行信息，掩码信息可以用于表示通过根据光流信息得到的连续两帧图像中的变化区域；进一步，根据光流信息与掩码信息将N帧样本内容图像映射到固定一帧图像，将映射处理后的N帧样本内容图像分别展成向量并按列组合成一个矩阵，则该矩阵为第一低秩矩阵。同理，第二低秩矩阵可以是基于N帧预测合成图像与光流信息得到的，根据光流信息与掩码信息将N帧预测合成图像映射到固定一帧图像，将映射处理后的N预测合成图像分别展成向量并按列组合成一个矩阵，则该矩阵为第二低秩矩阵，其中，光流信息用于表示N帧样本内容图像中相邻两帧图像之间对应像素点的位置差异。In a possible implementation, the first low-rank matrix is obtained based on N frames of sample content images and optical flow information; for example, the first low-rank matrix may refer to the calculation of adjacent image frames in N frames of sample content images The optical flow information between them; the mask information can be obtained according to the optical flow information, wherein the optical flow information is used to represent the running information of the pixels corresponding to the adjacent frame images, and the mask information can be used to represent the The change area in two consecutive frames of images; further, according to the optical flow information and mask information, N frames of sample content images are mapped to a fixed frame of images, and the mapped N frames of sample content images are respectively developed into vectors and arranged in columns Combined into a matrix, the matrix is the first low-rank matrix. Similarly, the second low-rank matrix can be obtained based on N frames of predicted composite images and optical flow information. According to the optical flow information and mask information, N frames of predicted composite images are mapped to a fixed frame of images, and the mapped N The predicted synthetic images are respectively developed into vectors and combined into a matrix by column, then the matrix is the second low-rank matrix, where the optical flow information is used to represent the corresponding pixel points between two adjacent frames of images in the N frame sample content image location difference.

结合第一方面，在第一方面的某些实现方式中，所述图像损失函数还包括残差损失函数，所述残差损失函数是根据第一样本合成图像与第二样本合成图像之间的差异得到的，其中，所述第一样本合成图像是指通过第一模型对所述N帧样本内容图像进行图像风格迁移处理得到的图像，所述第二样本合成图像是指通过第二模型对所述N帧样本内容图像进行图像风格迁移处理得到的图像，所述第一模型与所述第二模型是根据所述样本风格图像预先训练的图像风格迁移模型，所述第二模型包括光流模块，所述第一模型不包括所述光流模块，所述光流模块用于确定所述光流信息。With reference to the first aspect, in some implementations of the first aspect, the image loss function further includes a residual loss function, and the residual loss function is the difference between the composite image based on the first sample and the composite image based on the second sample. The first sample composite image refers to the image obtained by performing image style transfer processing on the N-frame sample content images through the first model, and the second sample composite image refers to the image obtained through the second The model is an image obtained by performing image style transfer processing on the N frames of sample content images, the first model and the second model are image style transfer models pre-trained according to the sample style images, and the second model includes An optical flow module, the first model does not include the optical flow module, and the optical flow module is used to determine the optical flow information.

在本申请实施例中，在训练目标风格迁移模型时引入残差损失函数的目标在于使得神经网络模型在训练的过程中能够学习包括光流模块的风格迁移模型与不包括光流模块的风格迁移模型输出的合成图像的差异，从而能够提高目标迁移模型得到的风格迁移处理后视频的稳定性。In the embodiment of this application, the goal of introducing the residual loss function when training the target style transfer model is to enable the neural network model to learn the style transfer model including the optical flow module and the style transfer model without the optical flow module during the training process. The difference of the synthesized image output by the model can improve the stability of the style transfer processed video obtained by the target transfer model.

应理解，第一样本合成图像与第二样本合成图像之间的差异可以是指第一样本合成图像与第二样本合成图像对应的像素值之间的差值。It should be understood that the difference between the first sample composite image and the second sample composite image may refer to a difference between pixel values corresponding to the first sample composite image and the second sample composite image.

在一种可能的实现方式中，第一模型与第二模型在训练阶段可以采用相同的样本内容图像以及样本风格图像；例如，在训练阶段第一模型与第二模型可以是指相同的模型；但是，在测试阶段第二模型还需要计算多帧样本内容图像之间的光流信息；而第一模型则不需要计算多帧图像之间的光流信息。In a possible implementation, the first model and the second model may use the same sample content image and sample style image in the training phase; for example, the first model and the second model may refer to the same model in the training phase; However, in the test phase, the second model also needs to calculate the optical flow information between multiple frames of sample content images; while the first model does not need to calculate the optical flow information between multiple frames of images.

结合第一方面，在第一方面的某些实现方式中，所述第一模型与所述第二模型为预先训练的老师模型，所述目标风格迁移模型是指根据所述残差损失函数与知识蒸馏算法对待训练的学生模型进行训练得到的目标学生模型。With reference to the first aspect, in some implementations of the first aspect, the first model and the second model are pre-trained teacher models, and the target style transfer model refers to the residual loss function and The knowledge distillation algorithm is the target student model obtained by training the student model to be trained.

在一种可能的实现方式中，目标风格迁移模型可以是指目标学生模型，在训练目标学生模型时可以根据预先训练的第一老师模型(不包括光流模块)、预先训练的第二老师模型(包括光流模块)、预先训练的基础模型对一个待训练的学生模型进行训练，从而得到目标学生模型；其中，待训练的学生模型、预先训练的基础模型以及目标学生模型的网络结构均相同，通过上述低秩损失函数、残差损失函数以及感知损失函数对待训练的学生模型进行训练，从而得到目标学生模型。In a possible implementation, the target style transfer model may refer to the target student model, and the target student model may be trained according to the pre-trained first teacher model (excluding the optical flow module), the pre-trained second teacher model (including the optical flow module), the pre-trained basic model trains a student model to be trained to obtain the target student model; wherein, the network structure of the student model to be trained, the pre-trained basic model, and the target student model are the same , train the student model to be trained through the above low-rank loss function, residual loss function and perceptual loss function, so as to obtain the target student model.

其中，上述预先训练的基础模型可以是指预先通过感知损失函数训练得到的在测试阶段不包括光流模块的风格迁移模型；或者，预先训练的风格迁移模型可以是指通过感知损失函数以及光流损失函数预先训练的在测试阶段不包括光流模块的风格迁移模型；感知损失函数用于表示合成图像与内容图像之间的内容损失以及合成图像与风格图像之间的风格损失；光流损失函数用于表示相邻帧合成图像对应像素点之间的差异。Wherein, the above-mentioned pre-trained basic model may refer to a style transfer model that does not include an optical flow module in the testing stage obtained through perceptual loss function training in advance; or, the pre-trained style transfer model may refer to Loss function pre-trained style transfer model that does not include the optical flow module in the test phase; perceptual loss function is used to represent the content loss between the synthetic image and the content image and the style loss between the synthetic image and the style image; the optical flow loss function It is used to represent the difference between the corresponding pixels of the composite image of adjacent frames.

在一种可能的实现方式中，在训练待训练学生模型的过程中，通过上述残差损失函数使得待训练的学生模型与预先训练的基础模型之间输出的迁移结果(又称为合成图像)的差异不断逼近第二模型与第一模型之间输出的迁移结果的差异。In a possible implementation, in the process of training the student model to be trained, the above-mentioned residual loss function is used to make the output migration result (also called a synthetic image) between the student model to be trained and the pre-trained basic model The difference of is constantly approaching the difference of the transfer results output between the second model and the first model.

在本申请的实施例中，目标风格迁移模型可以是指目标学生模型，通过采用老师-学生模型学习的知识蒸馏方法使得待训练的学生模型与预先训练的基础模型输出的风格迁移结果之间的差异不断逼近包括光流模块的老师模型与不包括光流模块的老师模型输出的风格迁移结果之间的差异，通过这种训练方法可以有效避免老师模型和学生模型风格不统一所造成的重影现象。In the embodiment of the present application, the target style transfer model may refer to the target student model, by adopting the knowledge distillation method of teacher-student model learning, the relationship between the student model to be trained and the style transfer result output by the pre-trained basic model is The difference is constantly approaching the difference between the style transfer results output by the teacher model that includes the optical flow module and the teacher model that does not include the optical flow module. This training method can effectively avoid the ghosting caused by the inconsistent styles of the teacher model and the student model. Phenomenon.

结合第一方面，在第一方面的某些实现方式中，所述残差损失函数是根据以下等式得到的，With reference to the first aspect, in some implementations of the first aspect, the residual loss function is obtained according to the following equation,

其中，L_res表示所述残差损失函数；N_T表示所述第二模型；表示所述第一模型；N_S表示所述待训练的学生模型；/>表示预先训练的基础模型，所述预先训练的基础模型与所述待训练的学生模型的网络结构相同；xⁱ表示所述样本视频中包括的第i帧样本内容图像，i为正整数。Wherein, L _res represents the residual loss function; N _T represents the second model; Represents the first model; N _S represents the student model to be trained; /> Represents a pre-trained basic model, which has the same network structure as the student model to be trained; ^xi represents the i-th frame sample content image included in the sample video, and i is a positive integer.

结合第一方面，在第一方面的某些实现方式中，所述图像损失函数还包括感知损失函数，其中，所述感知损失函数包括内容损失与风格损失，所述内容损失用于表示所述N帧预测合成图像与其对应的所述N帧样本内容图像之间的图像内容差异，所述风格损失用于表示所述N帧预测合成图像与所述样本风格图像之间的图像风格差异。With reference to the first aspect, in some implementations of the first aspect, the image loss function further includes a perceptual loss function, wherein the perceptual loss function includes a content loss and a style loss, and the content loss is used to represent the Image content differences between the N frames of predicted composite images and their corresponding N frames of sample content images, the style loss is used to represent the image style differences between the N frames of predicted composite images and the sample style images.

结合第一方面，在第一方面的某些实现方式中，所述图像损失函数是通过对所述低秩损失函数、所述残差损失函数以及所述感知损失函数加权处理得到的。With reference to the first aspect, in some implementation manners of the first aspect, the image loss function is obtained by weighting the low-rank loss function, the residual loss function, and the perceptual loss function.

结合第一方面，在第一方面的某些实现方式中，所述目标风格迁移模型的参数是基于所述图像损失函数通过反向传播算法多次迭代得到的。With reference to the first aspect, in some implementation manners of the first aspect, the parameters of the target style transfer model are obtained based on the image loss function through multiple iterations of a backpropagation algorithm.

第二方面，视频风格迁移的方法，包括：获取待处理视频，其中，所述待处理视频包括N帧待处理内容图像，N为大于或者等于2的整数；根据目标风格迁移模型对所述N帧待处理内容图像进行图像风格迁移处理，得到N帧合成图像；根据所述N帧合成图像，得到所述待处理视频对应的风格迁移处理后的视频，In the second aspect, a method for video style transfer includes: acquiring a video to be processed, wherein the video to be processed includes N frames of content images to be processed, and N is an integer greater than or equal to 2; Perform image style transfer processing on the content image to be processed to obtain N frames of composite images; according to the N frames of composite images, obtain the video after the style transfer processing corresponding to the video to be processed,

其中，所述目标风格迁移模型的参数是根据所述目标风格迁移模型对N帧样本内容图像进行风格迁移处理的图像损失函数确定的，所述图像损失函数包括低秩损失函数，所述低秩损失函数用于表示第一低秩矩阵与第二低秩矩阵之间的差异，所述第一低秩矩阵是基于所述N帧样本内容图像与光流信息得到的，所述第二低秩矩阵是基于N帧预测合成图像与所述光流信息得到的，所述光流信息用于表示所述N帧样本内容图像中相邻两帧图像之间对应像素点的位置差异，所述N帧预测合成图像是指通过所述目标风格迁移模型根据样本风格图像对所述N帧样本内容图像进行图像风格迁移处理后得到的图像。Wherein, the parameters of the target style transfer model are determined according to the image loss function that performs style transfer processing on N frames of sample content images according to the target style transfer model, and the image loss function includes a low-rank loss function, and the low-rank The loss function is used to represent the difference between the first low-rank matrix and the second low-rank matrix, the first low-rank matrix is obtained based on the N frames of sample content images and optical flow information, and the second low-rank matrix The matrix is obtained based on N frames of predicted composite images and the optical flow information, the optical flow information is used to represent the position difference of corresponding pixels between two adjacent frames of images in the N frames of sample content images, and the N The frame prediction composite image refers to an image obtained by performing image style transfer processing on the N frames of sample content images according to the sample style images through the target style transfer model.

需要说明的是，图像风格迁移是指将一幅内容图像A中的图像内容与一幅风格图像B的图像风格融合在一起，从而生成一张具有A图像内容和B图像风格的合成图像C；其中，图像中的风格可以包括图像的纹理特征等信息；图像中的内容可以是指图像中的语义信息，即可以包括内容图像中的高频信息、低频信息等。It should be noted that image style transfer refers to merging the image content of a content image A with the image style of a style image B, so as to generate a composite image C with the image content of A and the image style of B; The style in the image may include information such as texture features of the image; the content in the image may refer to semantic information in the image, that is, it may include high-frequency information, low-frequency information, etc. in the content image.

在另一方面，本申请实施例提供的目标风格迁移模型在对待处理视频进行风格迁移处理的过程中不需要计算待处理视频中包括的多帧图像之间的光流信息，因此本申请实施例提供的目标迁风格移模型在提高稳定性的同时还能缩短模型的风格迁移处理的时间，提升目标风格迁移模型的运行效率。On the other hand, the target style transfer model provided by the embodiment of the present application does not need to calculate the optical flow information between the multiple frames of images included in the video to be processed during the style transfer process of the video to be processed, so the embodiment of the present application The provided target style transfer model can not only improve the stability, but also shorten the processing time of the model's style transfer, and improve the operating efficiency of the target style transfer model.

在一种可能的实现方式中，待处理视频可以是电子设备通过摄像头拍摄到的视频，或者，该待处理视频还可以是从电子设备内部获得的视频(例如，电子设备的相册中存储的视频，或者，电子设备从云端获取的视频)。In a possible implementation manner, the video to be processed may be a video captured by the electronic device through a camera, or the video to be processed may also be a video obtained from inside the electronic device (for example, a video stored in the photo album of the electronic device , or, video captured by the electronic device from the cloud).

应理解，上述待处理视频可以是具有风格迁移需求的视频，本申请并不对待处理视频的来源作任何限定。It should be understood that the above-mentioned video to be processed may be a video that requires style transfer, and this application does not set any limitation on the source of the video to be processed.

结合第二方面，在第二方面的某些实现方式中，所述图像损失函数还包括残差损失函数，所述残差损失函数是根据第一样本合成图像与第二样本合成图像之间的差异得到的，其中，所述第一样本合成图像是指通过第一模型对所述N帧样本内容图像进行图像风格迁移处理得到的图像，所述第二样本合成图像是指通过第二模型对所述N帧样本内容图像进行图像风格迁移处理得到的图像，所述第一模型与所述第二模型是根据所述样本风格图像预先训练的图像风格迁移模型，所述第二模型包括光流模块，所述第一模型不包括所述光流模块，所述光流模块用于确定所述光流信息。With reference to the second aspect, in some implementations of the second aspect, the image loss function further includes a residual loss function, and the residual loss function is the difference between the composite image based on the first sample and the composite image based on the second sample. The first sample composite image refers to the image obtained by performing image style transfer processing on the N-frame sample content images through the first model, and the second sample composite image refers to the image obtained through the second The model is an image obtained by performing image style transfer processing on the N frames of sample content images, the first model and the second model are image style transfer models pre-trained according to the sample style images, and the second model includes An optical flow module, the first model does not include the optical flow module, and the optical flow module is used to determine the optical flow information.

应理解，第一样本合成图像与第二样本合成图像之间的差异可以是指第一样本合成图像与第二样本合成图像对应的像素值之间的差值。在一种可能的实现方式中，第一模型与第二模型在训练阶段可以采用相同的样本内容图像以及样本风格图像；例如，在训练阶段第一模型与第二模型可以是指相同的模型；但是，在测试阶段第二模型还需要计算多帧样本内容图像之间的光流信息；而第一模型则不需要计算多帧图像之间的光流信息。It should be understood that the difference between the first sample composite image and the second sample composite image may refer to a difference between pixel values corresponding to the first sample composite image and the second sample composite image. In a possible implementation, the first model and the second model may use the same sample content image and sample style image in the training phase; for example, the first model and the second model may refer to the same model in the training phase; However, in the test phase, the second model also needs to calculate the optical flow information between multiple frames of sample content images; while the first model does not need to calculate the optical flow information between multiple frames of images.

结合第二方面，在第二方面的某些实现方式中，所述第一模型与所述第二模型为预先训练的老师模型，所述目标风格迁移模型是指根据所述残差损失函数与知识蒸馏算法对待训练的学生模型进行训练得到的目标学生模型。With reference to the second aspect, in some implementations of the second aspect, the first model and the second model are pre-trained teacher models, and the target style transfer model refers to the residual loss function and The knowledge distillation algorithm is the target student model obtained by training the student model to be trained.

需要说明的是，上述学生模型与目标学生模型的网络结构可以是相同的，即学生模型可以是指预先训练的在测试阶段不需要输入光流信息的风格迁移模型；而目标学生模型是指在学生模型的基础上通过上述残差损失函数以及低秩损失函数进一步进行训练得到的模型。It should be noted that the network structure of the above student model and the target student model can be the same, that is, the student model can refer to a pre-trained style transfer model that does not need to input optical flow information in the test phase; while the target student model refers to the Based on the student model, the model obtained by further training through the above residual loss function and low rank loss function.

在一种可能的实现方式中，预先训练的学生模型可以是通过感知损失函数预先训练得到的学生模型，感知损失函数用于表示视频风格化的效果，即可以用于表示样本合成图像与样本风格图像之间的内容差异以及样本合成图像与样本内容图像之间的风格差异。In a possible implementation, the pre-trained student model can be a student model pre-trained by a perceptual loss function, and the perceptual loss function is used to represent the effect of video stylization, that is, it can be used to represent the sample composite image and the sample style Content differences between images and style differences between sample composite images and sample content images.

在一种可能的实现方式中，预先训练的学生模型可以是通过感知损失函数预先训练得到的学生模型，光流损失函数用于表示相邻帧合成图像对应像素点之间的差异。In a possible implementation manner, the pre-trained student model may be a student model obtained by pre-training through a perceptual loss function, and the optical flow loss function is used to represent the difference between corresponding pixels of the synthesized images of adjacent frames.

结合第二方面，在第二方面的某些实现方式中，所述残差损失函数是根据以下等式得到的，With reference to the second aspect, in some implementations of the second aspect, the residual loss function is obtained according to the following equation,

在一种可能的实现方式中，在训练待训练学生模型的过程中，通过上述残差损失函数使得待训练的学生模型与预先训练的基础模型之间输出的迁移结果(又称为合成图像)的差异不断逼近第二模型与第一模型输出的迁移结果的之间差异。In a possible implementation, in the process of training the student model to be trained, the above-mentioned residual loss function is used to make the output migration result (also called a synthetic image) between the student model to be trained and the pre-trained basic model The difference of is constantly approaching the difference between the migration results output by the second model and the first model.

结合第二方面，在第二方面的某些实现方式中，所述图像损失函数还包括感知损失函数，其中，所述感知损失函数包括内容损失与风格损失，所述内容损失用于表示所述N帧预测合成图像与其对应的所述N帧样本内容图像之间的图像内容差异，所述风格损失用于表示所述N帧预测合成图像与所述样本风格图像之间的图像风格差异。With reference to the second aspect, in some implementations of the second aspect, the image loss function further includes a perceptual loss function, wherein the perceptual loss function includes a content loss and a style loss, and the content loss is used to represent the Image content differences between the N frames of predicted composite images and their corresponding N frames of sample content images, the style loss is used to represent the image style differences between the N frames of predicted composite images and the sample style images.

结合第二方面，在第二方面的某些实现方式中，所述图像损失函数是通过对所述低秩损失函数、所述残差损失函数以及所述感知损失函数加权处理得到的。With reference to the second aspect, in some implementation manners of the second aspect, the image loss function is obtained by weighting the low-rank loss function, the residual loss function, and the perceptual loss function.

结合第二方面，在第二方面的某些实现方式中，所述目标风格迁移模型的参数是基于所述图像损失函数通过反向传播算法多次迭代得到的。With reference to the second aspect, in some implementation manners of the second aspect, the parameters of the target style transfer model are obtained through multiple iterations of a backpropagation algorithm based on the image loss function.

第三方面，提供了一种风格迁移模型的训练装置，包括：获取单元，用于获取训练数据，其中，所述训练数据包括N帧样本内容图像、样本风格图像以及N帧合成图像，所述N帧合成图像是根据所述样本风格图像对所述N帧样本内容图像进行图像风格迁移处理后得到的图像，N为大于或者等于2的整数；处理单元，用于通过神经网络模型根据所述样本风格图像对所述N帧样本内容图像进行图像风格迁移处理，得到N帧预测合成图像；根据所述N帧样本内容图像与所述N帧预测合成图像之间的图像损失函数，确定所述神经网络模型的参数，其中，所述图像损失函数包括低秩损失函数，所述低秩损失函数用于表示第一低秩矩阵与第二低秩矩阵之间的差异，所述第一低秩矩阵是基于所述N帧样本内容图像与光流信息得到的，所述第二低秩矩阵是基于所述N帧预测合成图像与所述光流信息得到的，所述光流信息用于表示所述N帧样本内容图像中相邻两帧图像之间对应像素点的位置差异。In a third aspect, a training device for a style transfer model is provided, including: an acquisition unit configured to acquire training data, wherein the training data includes N frames of sample content images, sample style images, and N frames of synthetic images, the N frames of synthetic images are images obtained by performing image style transfer processing on the N frames of sample content images according to the sample style images, and N is an integer greater than or equal to 2; the processing unit is used to use the neural network model according to the The sample style image performs image style transfer processing on the N frames of sample content images to obtain N frames of predicted composite images; according to the image loss function between the N frames of sample content images and the N frames of predicted composite images, determine the The parameters of the neural network model, wherein the image loss function includes a low-rank loss function, the low-rank loss function is used to represent the difference between the first low-rank matrix and the second low-rank matrix, and the first low-rank The matrix is obtained based on the N frames of sample content images and optical flow information, the second low-rank matrix is obtained based on the N frames of predicted composite images and the optical flow information, and the optical flow information is used to represent Position differences of corresponding pixel points between two adjacent frames of images in the N frames of sample content images.

在一种可能的实现方式中，上述训练装置中包括功能单元/模块还用于执行第一方面以及第一方面中的任意一种实现方式中的方法。In a possible implementation manner, the above-mentioned training device includes a functional unit/module for executing the first aspect and the method in any one of the implementation manners of the first aspect.

应理解，在上述第一方面中对相关内容的扩展、限定、解释和说明也适用于第三方面中相同的内容。It should be understood that the extensions, limitations, explanations and descriptions of relevant content in the first aspect above are also applicable to the same content in the third aspect.

第四方面，提供了一种视频风格迁移的装置，包括：获取单元，用于获取待处理视频，其中，所述待处理视频包括N帧待处理内容图像，N为大于或者等于2的整数；处理单元，用于根据目标风格迁移模型对所述N帧待处理内容图像进行图像风格迁移处理，得到N帧合成图像；根据所述N帧合成图像，得到所述待处理视频对应的风格迁移处理后的视频，In a fourth aspect, an apparatus for video style transfer is provided, including: an acquisition unit configured to acquire a video to be processed, wherein the video to be processed includes N frames of content images to be processed, and N is an integer greater than or equal to 2; A processing unit, configured to perform image style transfer processing on the N frames of content images to be processed according to the target style transfer model to obtain N frames of composite images; according to the N frames of composite images, obtain the corresponding style transfer processing of the video to be processed after the video,

在一种可能的实现方式中，上述装置中包括功能单元/模块还用于执行第二方面以及第二方面中的任意一种实现方式中的方法。In a possible implementation manner, the above apparatus includes a functional unit/module further configured to execute the second aspect and the method in any one implementation manner of the second aspect.

应理解，在上述第二方面中对相关内容的扩展、限定、解释和说明也适用于第四方面中相同的内容。It should be understood that the extensions, limitations, explanations and descriptions of related content in the second aspect above are also applicable to the same content in the fourth aspect.

第五方面，提供了一种风格迁移模型的训练装置，包括：存储器，用于存储程序；处理器，用于执行该存储器存储的程序，当该存储器存储的程序被执行时，该处理器用于执行：获取训练数据，其中，所述训练数据包括N帧样本内容图像、样本风格图像以及N帧合成图像，所述N帧合成图像是根据所述样本风格图像对所述N帧样本内容图像进行图像风格迁移处理后得到的图像，N为大于或者等于2的整数；通过神经网络模型根据所述样本风格图像对所述N帧样本内容图像进行图像风格迁移处理，得到N帧预测合成图像；根据所述N帧样本内容图像与所述N帧预测合成图像之间的图像损失函数，确定所述神经网络模型的参数，其中，所述图像损失函数包括低秩损失函数，所述低秩损失函数用于表示第一低秩矩阵与第二低秩矩阵之间的差异，所述第一低秩矩阵是基于所述N帧样本内容图像与光流信息得到的，所述第二低秩矩阵是基于所述N帧预测合成图像与所述光流信息得到的，所述光流信息用于表示所述N帧样本内容图像中相邻两帧图像之间对应像素点的位置差异。In a fifth aspect, a training device for a style transfer model is provided, including: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processor is used for Execution: Acquire training data, wherein the training data includes N frames of sample content images, sample style images, and N frames of composite images, and the N frames of composite images are performed on the N frames of sample content images according to the sample style images For the image obtained after the image style transfer processing, N is an integer greater than or equal to 2; the image style transfer processing is performed on the N frames of sample content images according to the sample style image through the neural network model, and N frames of predicted composite images are obtained; An image loss function between the N frames of sample content images and the N frames of predicted composite images to determine the parameters of the neural network model, wherein the image loss function includes a low-rank loss function, and the low-rank loss function Used to represent the difference between the first low-rank matrix and the second low-rank matrix, the first low-rank matrix is obtained based on the N-frame sample content images and optical flow information, and the second low-rank matrix is It is obtained based on the N frames of predicted composite images and the optical flow information, where the optical flow information is used to represent the position difference of corresponding pixels between two adjacent frames of images in the N frames of sample content images.

在一种可能的实现方式中，上述训练装置中包括处理器还用于执行第一方面以及第一方面中的任意一种实现方式中的方法。In a possible implementation manner, the training device includes a processor further configured to execute the first aspect and the method in any one implementation manner of the first aspect.

应理解，在上述第一方面中对相关内容的扩展、限定、解释和说明也适用于第五方面中相同的内容。It should be understood that the extensions, limitations, explanations and descriptions of relevant content in the first aspect above are also applicable to the same content in the fifth aspect.

第六方面，提供了一种视频风格迁移的装置，包括：存储器，用于存储程序；处理器，用于执行该存储器存储的程序，当该存储器存储的程序被执行时，该处理器用于执行：获取待处理视频，其中，所述待处理视频包括N帧待处理内容图像，N为大于或者等于2的整数；根据目标风格迁移模型对所述N帧待处理内容图像进行图像风格迁移处理，得到N帧合成图像；根据所述N帧合成图像，得到所述待处理视频对应的风格迁移处理后的视频，其中，所述目标风格迁移模型的参数是根据所述目标风格迁移模型对N帧样本内容图像进行风格迁移处理的图像损失函数确定的，所述图像损失函数包括低秩损失函数，所述低秩损失函数用于表示第一低秩矩阵与第二低秩矩阵之间的差异，所述第一低秩矩阵是基于所述N帧样本内容图像与光流信息得到的，所述第二低秩矩阵是基于所述N帧预测合成图像与所述光流信息得到的，所述光流信息用于表示所述N帧样本内容图像中相邻两帧图像之间对应像素点的位置差异，所述N帧预测合成图像是指通过所述目标风格迁移模型根据样本风格图像对所述N帧样本内容图像进行图像风格迁移处理后得到的图像。In the sixth aspect, there is provided a device for video style migration, including: a memory for storing programs; a processor for executing the programs stored in the memory, and when the programs stored in the memory are executed, the processor is used for executing : Acquiring the video to be processed, wherein the video to be processed includes N frames of content images to be processed, and N is an integer greater than or equal to 2; performing image style transfer processing on the N frames of content images to be processed according to the target style transfer model, Obtain N frames of composite images; according to the N frames of composite images, obtain the style transfer-processed video corresponding to the video to be processed, wherein the parameters of the target style transfer model are based on the target style transfer model for N frames The sample content image is determined by an image loss function for style transfer processing, the image loss function includes a low-rank loss function, and the low-rank loss function is used to represent the difference between the first low-rank matrix and the second low-rank matrix, The first low-rank matrix is obtained based on the N frames of sample content images and optical flow information, the second low-rank matrix is obtained based on the N frames of predicted composite images and the optical flow information, and the The optical flow information is used to represent the position difference of corresponding pixels between two adjacent frames of images in the N frames of sample content images, and the N frames of predicted composite images refer to the pairing of the target style images according to the sample style images by the target style transfer model. An image obtained after performing image style transfer processing on the N frames of sample content images.

在一种可能的实现方式中，上述装置中包括的处理器还用于执行第二方面中的任意二种实现方式中的训练方法。In a possible implementation manner, the processor included in the above device is further configured to execute the training method in any two implementation manners in the second aspect.

应理解，在上述第二方面中对相关内容的扩展、限定、解释和说明也适用于第六方面中相同的内容。It should be understood that the extensions, limitations, explanations and descriptions of related content in the second aspect above are also applicable to the same content in the sixth aspect.

第七方面，提供了一种计算机可读介质，该计算机可读介质存储用于设备执行的程序代码，该程序代码包括用于执行上述第一方面至第二方面以及第一方面至第二方面中的任意一种实现方式中的方法。In a seventh aspect, a computer-readable medium is provided, the computer-readable medium stores program code for execution by a device, and the program code includes a program code for executing the above-mentioned first aspect to the second aspect and the first aspect to the second aspect A method in any of the implementations.

第八方面，提供了一种包含指令的计算机程序产品，当该计算机程序产品在计算机上运行时，使得计算机执行上述第一方面至第二方面以及第一方面至第二方面中的任意一种实现方式中的方法。In an eighth aspect, a computer program product containing instructions is provided, and when the computer program product is run on a computer, it causes the computer to execute any one of the above-mentioned first to second aspects and the first to second aspects method in the implementation.

第九方面，提供了一种芯片，所述芯片包括处理器与数据接口，所述处理器通过所述数据接口读取存储器上存储的指令，执行上述第一方面至第二方面以及第一方面至第二方面中的任意一种实现方式中的方法。A ninth aspect provides a chip, the chip includes a processor and a data interface, the processor reads instructions stored on the memory through the data interface, and executes the above-mentioned first to second aspects and the first aspect to the method in any one of the implementation manners in the second aspect.

可选地，作为一种实现方式，所述芯片还可以包括存储器，所述存储器中存储有指令，所述处理器用于执行所述存储器上存储的指令，当所述指令被执行时，所述处理器用于执行上述第一方面至第二方面以及第一方面至第二方面中的任意一种实现方式中的方法。Optionally, as an implementation manner, the chip may further include a memory, the memory stores instructions, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in any one implementation manner of the first aspect to the second aspect and any one of the first aspect to the second aspect.

附图说明Description of drawings

图1是本申请实施例提供的一种人工智能主体框架示意图；Fig. 1 is a schematic diagram of an artificial intelligence subject framework provided by an embodiment of the present application;

图2是本申请实施例提供的应用场景的示意图；FIG. 2 is a schematic diagram of an application scenario provided by an embodiment of the present application;

图3是本申请实施例提供了一种系统架构；Fig. 3 is a system architecture provided by the embodiment of the present application;

图4是本申请实施例提供的一种卷积神经网络结构示意图；FIG. 4 is a schematic structural diagram of a convolutional neural network provided by an embodiment of the present application;

图5是本申请实施例提供的一种芯片硬件结构示意图；FIG. 5 is a schematic diagram of a chip hardware structure provided by an embodiment of the present application;

图6是本申请实施例提供了一种系统架构；FIG. 6 shows a system architecture provided by an embodiment of the present application;

图7是本申请实施例提供的风格迁移模型的训练方法的示意性流程图；Fig. 7 is a schematic flow chart of the training method of the style transfer model provided by the embodiment of the present application;

图8是本申请实施例提供的风格迁移模型的训练过程的示意图；Fig. 8 is a schematic diagram of the training process of the style transfer model provided by the embodiment of the present application;

图9是本申请实施例提供的视频风格迁移的方法的示意性流程图；FIG. 9 is a schematic flowchart of a method for video style transfer provided by an embodiment of the present application;

图10是本申请实施例提供的训练阶段以及测试阶段的示意图；Fig. 10 is a schematic diagram of the training phase and the testing phase provided by the embodiment of the present application;

图11是本申请实施例提供的视频风格迁移的装置的示意性框图；FIG. 11 is a schematic block diagram of an apparatus for video style migration provided by an embodiment of the present application;

图12是本申请实施例提供的风格迁移模型的训练装置的示意性框图；FIG. 12 is a schematic block diagram of a training device for a style transfer model provided by an embodiment of the present application;

图13是本申请实施例提供的视频风格迁移的装置的示意性框图；FIG. 13 is a schematic block diagram of an apparatus for video style migration provided by an embodiment of the present application;

图14是本申请实施例提供的风格迁移模型的训练装置的示意性框图。Fig. 14 is a schematic block diagram of a training device for a style transfer model provided by an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行描述；显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。根据本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The following will describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application; obviously, the described embodiments are only a part of the embodiments of the application, not all the embodiments. According to the embodiments in the present application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present application.

图1示出一种人工智能主体框架示意图，该主体框架描述了人工智能系统总体工作流程，适用于通用的人工智能领域需求。Figure 1 shows a schematic diagram of an artificial intelligence main framework, which describes the overall workflow of an artificial intelligence system and is applicable to general artificial intelligence field requirements.

下面从“智能信息链”(水平轴)和“信息技术(information technology，IT)价值链”(垂直轴)两个维度对上述人工智能主题框架100进行详细的阐述。The artificial intelligence theme framework 100 will be described in detail below from the two dimensions of "intelligent information chain" (horizontal axis) and "information technology (information technology, IT) value chain" (vertical axis).

“智能信息链”反映从数据的获取到处理的一列过程。举例来说，可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中，数据经历了“数据—信息—知识—智慧”的凝练过程。"Intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensed process of "data-information-knowledge-wisdom".

“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程，反映人工智能为信息技术产业带来的价值。"IT value chain" reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of artificial intelligence, information (provided and processed by technology) to the systematic industrial ecological process.

(1)基础设施110(1) Infrastructure 110

基础设施可以为人工智能系统提供计算能力支持，实现与外部世界的沟通，并通过基础平台实现支撑。The infrastructure can provide computing power support for the artificial intelligence system, realize communication with the outside world, and realize support through the basic platform.

基础设施可以通过传感器与外部沟通，基础设施的计算能力可以由智能芯片提供。The infrastructure can communicate with the outside through sensors, and the computing power of the infrastructure can be provided by smart chips.

这里的智能芯片可以是中央处理器(central processing unit，CPU)、神经网络处理器(neural-network processing unit，NPU)、图形处理器(graphics processingunit，GPU)、专门应用的集成电路(application specific integrated circuit，ASIC)以及现场可编程门阵列(field programmable gate array，FPGA)等硬件加速芯片。The smart chip here can be a central processing unit (central processing unit, CPU), a neural network processor (neural-network processing unit, NPU), a graphics processing unit (graphics processing unit, GPU), an application-specific integrated circuit (application specific integrated circuit, ASIC) and field programmable gate array (field programmable gate array, FPGA) and other hardware acceleration chips.

基础设施的基础平台可以包括分布式计算框架及网络等相关的平台保障和支持，可以包括云存储和计算、互联互通网络等。The basic platform of infrastructure can include related platform guarantees and supports such as distributed computing framework and network, and can include cloud storage and computing, interconnection and interworking network, etc.

例如，对于基础设施来说，可以通过传感器和外部沟通获取数据，然后将这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。For example, for infrastructure, data can be obtained through sensors and external communication, and then these data can be provided to smart chips in the distributed computing system provided by the basic platform for calculation.

(2)数据120(2) Data 120

基础设施的上一层的数据用于表示人工智能领域的数据来源。该数据涉及到图形、图像、语音、文本，还涉及到传统设备的物联网数据，包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence. The data involves graphics, images, voice, text, and IoT data of traditional equipment, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.

(3)数据处理130(3) Data processing 130

上述数据处理通常包括数据训练，机器学习，深度学习，搜索，推理，决策等处理方式。The above data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other processing methods.

其中，机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can symbolize and formalize intelligent information modeling, extraction, preprocessing, training, etc. of data.

推理是指在计算机或智能系统中，模拟人类的智能推理方式，依据推理控制策略，利用形式化的信息进行机器思维和求解问题的过程，典型的功能是搜索与匹配。Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, and using formalized information to carry out machine thinking and solve problems according to reasoning control strategies. The typical functions are search and matching.

决策是指智能信息经过推理后进行决策的过程，通常提供分类、排序、预测等功能。Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.

(4)通用能力140(4) General ability 140

对数据经过上面提到的数据处理后，进一步根据数据处理的结果可以形成一些通用的能力，比如可以是算法或者一个通用系统，例如，翻译，文本的分析，计算机视觉的处理，语音识别，图像的识别等等。After the data is processed by the data mentioned above, some general capabilities can be formed according to the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image processing identification, etc.

(5)智能产品及行业应用150(5) Smart products and industry applications 150

智能产品及行业应用指人工智能系统在各领域的产品和应用，是对人工智能整体解决方案的封装，将智能信息决策产品化、实现落地应用，其应用领域主要包括：智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶，平安城市，智能终端等。Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is the packaging of the overall solution of artificial intelligence, which commercializes intelligent information decision-making and realizes landing applications. Its application fields mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical care, smart security, automatic driving, safe city, smart terminals, etc.

图2是本申请实施例提供的应用场景的示意图。FIG. 2 is a schematic diagram of an application scenario provided by an embodiment of the present application.

如图2所示，本申请实施例的视频风格迁移的方法可以应用于智能终端上；例如，将通过智能终端的摄像头的待处理视频，或者智能终端中相册中存储的待处理视频输入至本申请实施例提供的目标风格迁移模型中，从而得到风格迁移处理后的视频；通过采用本申请实施例提供的目标风格迁移模型，能够确保风格迁移处理后视频的稳定性，即确保风格迁移处理后得到视频的流畅性。As shown in Figure 2, the video style transfer method of the embodiment of the present application can be applied to the smart terminal; In the target style transfer model provided by the embodiment of the application, the video after the style transfer process is obtained; by adopting the target style transfer model provided by the embodiment of the present application, the stability of the video after the style transfer process can be ensured, that is, the video after the style transfer process can be ensured. Get the fluency of the video.

在一个示例中，本申请实施例提供的视频风格迁移的方法可以应用于离线场景中。In an example, the video style transfer method provided in the embodiment of the present application can be applied in an offline scenario.

例如，通过获取待处理视频，将待处理视频输入至目标风格迁移模型中，从而得到风格迁移处理后的视频，即输出的稳定的风格化的视频。For example, by acquiring the video to be processed, the video to be processed is input into the target style transfer model, so as to obtain the video processed by the style transfer, that is, the output stable stylized video.

在一个示例中，本申请实施例提供的视频风格迁移的方法可以应用于在线场景中。In an example, the video style transfer method provided in the embodiment of the present application can be applied in an online scene.

例如，获取智能终端实时录制的视频，将该实时录制的视频输入至目标风格迁移模型中，从而得到实时输出的风格迁移处理后的视频；比如，可以用于展台实时展示等场景For example, obtain the real-time recorded video of the smart terminal, and input the real-time recorded video into the target style transfer model, so as to obtain the real-time output style-transferred video; for example, it can be used in scenarios such as real-time display of booths

例如，通过智能终端进行在线视频通话时，可以将摄像头实时拍摄的用户视频输入至目标风格迁移模型，从而得到输出的风格迁移处理后的视频。比如，可以实时地向别人展示稳定的风格化视频，提升趣味性。For example, when making an online video call through a smart terminal, the user video captured by the camera in real time can be input into the target style transfer model, so as to obtain the output video after style transfer processing. For example, a stable stylized video can be shown to others in real time to enhance the fun.

其中，上述目标风格迁移模型是通过本申请实施例提供的风格迁移模型的训练方法进行训练得到的预先训练的模型。Wherein, the above-mentioned target style transfer model is a pre-trained model obtained by training through the training method of the style transfer model provided in the embodiment of the present application.

示例性地，上述智能终端可以为移动的或固定的；例如，智能终端可以是具有图像处理功能的移动电话、平板个人电脑(tablet personal computer，TPC)、媒体播放器、智能电视、笔记本电脑(laptop computer，LC)、个人数字助理(personal digital assistant，PDA)、个人计算机(personal computer，PC)、照相机、摄像机、智能手表、可穿戴式设备(wearable device，WD)或者自动驾驶的车辆等，本申请实施例对此不作限定。Exemplarily, the above-mentioned smart terminal may be mobile or fixed; for example, the smart terminal may be a mobile phone with an image processing function, a tablet personal computer (tablet personal computer, TPC), a media player, a smart TV, a notebook computer ( laptop computer (LC), personal digital assistant (personal digital assistant, PDA), personal computer (personal computer, PC), camera, video camera, smart watch, wearable device (wearable device, WD) or self-driving vehicles, etc. This embodiment of the present application does not limit it.

应理解，上述为对应用场景的举例说明，并不对本申请的应用场景作任何限定。It should be understood that the foregoing is an illustration of an application scenario, and does not limit the application scenario of the present application in any way.

由于本申请实施例涉及大量神经网络的应用，为了便于理解，下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。Since the embodiment of the present application involves the application of a large number of neural networks, for ease of understanding, the following first introduces the related terms and concepts of the neural network that may be involved in the embodiment of the present application.

1、神经网络1. Neural network

神经网络可以是由神经单元组成的，神经单元可以是指以x_s和截距1为输入的运算单元，该运算单元的输出可以为：A neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes x _s and an intercept 1 as input, and the output of the operation unit can be:

其中，s＝1、2、……n，n为大于1的自然数，W_s为x_s的权重，b为神经单元的偏置。f为神经单元的激活函数(activation functions)，用于将非线性特性引入神经网络中，来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入，激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络，即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连，来提取局部接受域的特征，局部接受域可以是由若干个神经单元组成的区域。Among them, s=1, 2, ... n, n is a natural number greater than 1, W _s is the weight of x _s , and b is the bias of the neuron unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function. A neural network is a network formed by connecting multiple above-mentioned single neural units, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field. The local receptive field can be an area composed of several neural units.

2、深度神经网络2. Deep neural network

深度神经网络(deep neural network，DNN)，也称多层神经网络，可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分，DNN内部的神经网络可以分为三类：输入层，隐含层，输出层。一般来说第一层是输入层，最后一层是输出层，中间的层数都是隐含层。层与层之间是全连接的，也就是说，第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。A deep neural network (DNN), also known as a multilayer neural network, can be understood as a neural network with multiple hidden layers. DNN is divided according to the position of different layers, and the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the layers in the middle are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.

虽然DNN看起来很复杂，但是就每一层的工作来说，其实并不复杂，简单来说就是如下线性关系表达式：其中，/>是输入向量，/>是输出向量，/>是偏移向量，W是权重矩阵(也称系数)，α()是激活函数。每一层仅仅是对输入向量/>经过如此简单的操作得到输出向量/>由于DNN层数多，系数W和偏移向量/>的数量也比较多。这些参数在DNN中的定义如下所述：以系数W为例：假设在一个三层的DNN中，第二层的第4个神经元到第三层的第2个神经元的线性系数定义为/>。上标3代表系数W所在的层数，而下标对应的是输出的第三层索引2和输入的第二层索引4。Although DNN looks complicated, it is actually not complicated in terms of the work of each layer. In simple terms, it is the following linear relationship expression: where, /> is the input vector, /> is the output vector, /> Is the offset vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer is just an input vector /> After such a simple operation to get the output vector /> Due to the large number of DNN layers, the coefficient W and the offset vector /> The number is also higher. The definition of these parameters in DNN is as follows: Take the coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the fourth neuron of the second layer to the second neuron of the third layer is defined as /> . The superscript 3 represents the layer number of the coefficient W, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.

综上，第L-1层的第k个神经元到第L层的第j个神经元的系数定义为 In summary, the coefficient from the kth neuron of the L-1 layer to the jth neuron of the L layer is defined as

需要注意的是，输入层是没有W参数的。在深度神经网络中，更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言，参数越多的模型复杂度越高，“容量”也就越大，也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程，其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。It should be noted that the input layer has no W parameter. In deep neural networks, more hidden layers make the network more capable of describing complex situations in the real world. Theoretically speaking, a model with more parameters has a higher complexity and a greater "capacity", which means that it can complete more complex learning tasks. Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).

3、卷积神经网络3. Convolutional neural network

卷积神经网络(convolutional neuron network，CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器，该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中，一个神经元可以只与部分邻层神经元连接。一个卷积层中，通常包含若干个特征平面，每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重，这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化，在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外，共享权重带来的直接好处是减少卷积神经网络各层之间的连接，同时又降低了过拟合的风险。Convolutional neural network (CNN) is a deep neural network with a convolutional structure. The convolutional neural network contains a feature extractor composed of a convolutional layer and a subsampling layer, which can be regarded as a filter. The convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network. In the convolutional layer of a convolutional neural network, a neuron can only be connected to some adjacent neurons. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units of the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as a way to extract image information that is independent of location. The convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.

4、损失函数4. Loss function

在训练深度神经网络的过程中，因为希望深度神经网络的输出尽可能的接近真正想要预测的值，所以可以通过比较当前网络的预测值和真正想要的目标值，再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然，在第一次更新之前通常会有初始化的过程，即为深度神经网络中的各层预先配置参数)；比如，如果网络的预测值高了，就调整权重向量让它预测低一些，不断地调整，直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此，就需要预先定义“如何比较预测值和目标值之间的差异”，这便是损失函数(loss function)或目标函数(objective function)，它们是用于衡量预测值和目标值的差异的重要方程。其中，以损失函数举例，损失函数的输出值(loss)越高表示差异越大，那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。In the process of training the deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then according to the difference between the two to update the weight vector of each layer of the neural network (of course, there is usually an initialization process before the first update, that is, to pre-configure parameters for each layer in the deep neural network); for example, if the predicted value of the network If it is high, adjust the weight vector to make it predict lower, and keep adjusting until the deep neural network can predict the real desired target value or a value very close to the real desired target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function (loss function) or objective function (objective function), which are used to measure the difference between the predicted value and the target value important equation. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference. Then the training of the deep neural network becomes a process of reducing the loss as much as possible.

5、反向传播算法5. Back propagation algorithm

神经网络可以采用误差反向传播(back propagation，BP)算法在训练过程中修正初始的神经网络模型中参数的大小，使得神经网络模型的重建误差损失越来越小。The neural network can use the error back propagation (back propagation, BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller.

具体地，前向传递输入信号直至输出会产生误差损失，通过反向传播误差损失信息来更新初始的神经网络模型中参数，从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动，旨在得到最优的神经网络模型的参数，例如权重矩阵。Specifically, passing the input signal forward until the output will generate an error loss, and updating the parameters in the initial neural network model by backpropagating the error loss information, so that the error loss converges. The backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the neural network model, such as the weight matrix.

6、图像风格迁移6. Image style transfer

图像风格迁移是指将一幅内容图像A中的图像内容与一幅风格图像B的图像风格融合在一起，从而生成一张具有A图像内容和B图像风格的合成图像C。Image style transfer refers to fusing the image content of a content image A with the image style of a style image B to generate a composite image C with the content of image A and the style of image B.

示例性地，根据风格图像1对内容图像1进行图像风格迁移，可以得到合成图像1，其中，合成图像1中包括内容图像1中的内容以及风格图像1中的风格；类似地，根据风格图像2对内容图像1进行图像风格迁移，可以得到合成图像2，其中，合成图像2中包括内容图像1中的内容以及风格图像2中的风格。Exemplarily, image style transfer is performed on the content image 1 according to the style image 1 to obtain a composite image 1, wherein the composite image 1 includes the content in the content image 1 and the style in the style image 1; similarly, according to the style image 2 Perform image style transfer on the content image 1 to obtain a composite image 2, wherein the composite image 2 includes the content in the content image 1 and the style in the style image 2.

其中，风格图像可以是指进行风格迁移的参考图像，图像中的风格可以包括图像的纹理特征、图像的艺术表现形式；比如，著名画作的风格，图像的艺术表现形式可以包括卡通、漫画、油画、水彩、水墨等图像风格；内容图像可以是指需要进行风格迁移的图像，图像中的内容可以是指图像中的语义信息，即可以包括内容图像中的高频信息、低频信息等。Among them, the style image can refer to a reference image for style transfer, and the style in the image can include the texture features of the image and the artistic expression of the image; for example, the style of a famous painting, the artistic expression of the image can include cartoons, comics, oil paintings, etc. , watercolor, ink and other image styles; the content image can refer to the image that needs style transfer, and the content in the image can refer to the semantic information in the image, that is, it can include high-frequency information and low-frequency information in the content image.

7、光流信息7. Optical flow information

光流(optical flow or optic flow)用于表示空间运动物体在观察成像平面上的像素运动的瞬时速度，是利用图像序列中像素在时间域上的变化以及相邻帧之间的相关性来找到上一帧跟当前帧之间存在的对应关系，从而计算出相邻帧之间物体的运动信息的一种方法。Optical flow (optical flow or optic flow) is used to represent the instantaneous velocity of the pixel motion of a space moving object on the observation imaging plane, which is found by using the change of pixels in the image sequence in the time domain and the correlation between adjacent frames. A method to calculate the motion information of objects between adjacent frames by the corresponding relationship between the previous frame and the current frame.

8、知识蒸馏8. Knowledge Distillation

知识蒸馏是指使得深度学习模型小型化、达到终端设备部署要求的关键技术。相较与量化、稀疏化等压缩技术，其不需要特定的硬件支持就能达到压缩模型的目的。知识蒸馏技术采用老师-学生模型学习的策略，其中，老师模型可以指模型参数大，一般不能满足部署需求；而学生模型参数量少，能够直接部署。通过设计有效的知识蒸馏算法，让学生模型学习模仿老师模型的行为，进行有效的知识迁移，使得学生模型最终能够表现得与老师模型相同的处理能力。Knowledge distillation refers to the key technology to make the deep learning model miniaturized and meet the deployment requirements of terminal equipment. Compared with compression techniques such as quantization and sparsification, it does not require specific hardware support to achieve the purpose of compressing the model. The knowledge distillation technology adopts the strategy of teacher-student model learning. Among them, the teacher model can refer to large model parameters, which generally cannot meet the deployment requirements; while the student model has few parameters and can be directly deployed. By designing an effective knowledge distillation algorithm, the student model can learn to imitate the behavior of the teacher model, and carry out effective knowledge transfer, so that the student model can finally perform the same processing ability as the teacher model.

首先，介绍本申请实施例提供的视频风格迁移的方法与风格迁移模型的训练方法的系统架构。First, the system architecture of the video style transfer method and the style transfer model training method provided by the embodiment of the present application is introduced.

图3是本申请实施例提供了一种系统架构200。FIG. 3 shows a system architecture 200 provided by an embodiment of the present application.

如图3中的系统架构200所示，数据采集设备260用于采集训练数据。针对本申请实施例的风格迁移模型的训练方法来说，可以通过训练数据对目标风格迁移模型进行进一步训练，即数据采集设备260采集的训练数据。As shown in the system architecture 200 in FIG. 3 , the data collection device 260 is used to collect training data. Regarding the method for training a style transfer model in the embodiment of the present application, the target style transfer model can be further trained with training data, that is, the training data collected by the data collection device 260 .

示例性地，在本申请实施例中训练目标风格迁移模型的训练数据可以N帧样本内容图像、样本风格图像以及N帧样本合成图像，其中，N帧样本合成图像是根据样本风格图像对N帧样本内容图像进行图像风格迁移处理后得到的图像，N为大于或者等于2的整数。Exemplarily, in the embodiment of the present application, the training data for training the target style transfer model can be composed of N frames of sample content images, sample style images, and N frames of sample composite images, wherein the composite images of N frames of samples are based on the pairing of N frames of sample style images The sample content image is an image obtained after image style transfer processing, and N is an integer greater than or equal to 2.

在采集到训练数据之后，数据采集设备260将这些训练数据存入数据库230，训练设备220根据数据库230中维护的训练数据训练得到目标模型/规则201(即本申请实施例中的目标风格迁移模型)。训练设备220将训练数据输入目标风格迁移模型，直到训练目标风格迁移模型的输出数据与样本数据之间的差值满足预设条件(例如，预测数据与样本数据差值小于一定阈值，或者预测数据与样本数据的差值保持不变或不再减少)，从而完成目标模型/规则201的训练。After collecting the training data, the data acquisition device 260 stores the training data in the database 230, and the training device 220 obtains the target model/rule 201 according to the training data maintained in the database 230 (that is, the target style transfer model in the embodiment of the present application) ). The training device 220 inputs the training data into the target style transfer model until the difference between the output data of the training target style transfer model and the sample data satisfies a preset condition (for example, the difference between the predicted data and the sample data is less than a certain threshold, or the predicted data The difference with the sample data remains unchanged or no longer decreases), so as to complete the training of the target model/rule 201 .

其中，输出数据可以是指目标风格迁移模型输出的N帧预测合成图像；样本数据可以是指N帧样本合成图像。Wherein, the output data may refer to N frames of predicted composite images output by the target style transfer model; the sample data may refer to N frames of sample composite images.

在本申请提供的实施例中，该目标模型/规则201是通过训练目标风格迁移模型得到的，目标风格迁移模型可以用于对待处理视频进行风格迁移处理。需要说明的是，在实际的应用中，所述数据库230中维护的训练数据不一定都来自于数据采集设备260的采集，也有可能是从其他设备接收得到的。In the embodiment provided in the present application, the target model/rule 201 is obtained by training a target style transfer model, and the target style transfer model can be used to perform style transfer processing on the video to be processed. It should be noted that, in practical applications, the training data maintained in the database 230 may not all be collected by the data collection device 260, but may also be received from other devices.

需要说明的是，训练设备220也不一定完全根据数据库230维护的训练数据进行目标模型/规则201的训练，也有可能从云端或其他地方获取训练数据进行模型训练，上述描述不应该作为对本申请实施例的限定。It should be noted that the training device 220 does not necessarily perform the training of the target model/rule 201 completely based on the training data maintained by the database 230, and it is also possible to obtain training data from the cloud or other places for model training. Example limitations.

还需要说明的是，数据库230中维护的训练数据中的至少部分数据也可以用于执行设备210对待处理处理进行处理的过程。It should also be noted that at least part of the training data maintained in the database 230 may also be used to execute the process of the processing to be processed by the device 210 .

根据训练设备220训练得到的目标模型/规则201可以应用于不同的系统或设备中，如应用于图3所示的执行设备210，所述执行设备210可以是终端，如手机终端，平板电脑，笔记本电脑，AR/VR，车载终端等，还可以是服务器或者云端等。The target model/rule 201 trained according to the training device 220 can be applied to different systems or devices, such as the execution device 210 shown in FIG. Laptops, AR/VR, vehicle terminals, etc., can also be servers or clouds.

在图3中，执行设备210配置输入/输出(input/output，I/O)接口212，用于与外部设备进行数据交互，用户可以通过客户设备240向I/O接口212输入数据，所述输入数据在本申请实施例中可以包括：客户设备输入的待处理视频。In FIG. 3 , the execution device 210 is configured with an input/output (input/output, I/O) interface 212 for data interaction with external devices, and the user can input data to the I/O interface 212 through the client device 240, the described In this embodiment of the application, the input data may include: a video to be processed input by the client device.

其中，预处理模块213和预处理模块214用于根据I/O接口212接收到的输入数据(如待处理视频)进行预处理。在本申请实施例中，也可以没有预处理模块213和预处理模块214(也可以只有其中的一个预处理模块)，而直接采用计算模块211对输入数据进行处理。Wherein, the preprocessing module 213 and the preprocessing module 214 are used to perform preprocessing according to the input data received by the I/O interface 212 (such as video to be processed). In the embodiment of the present application, there may be no preprocessing module 213 and preprocessing module 214 (or only one of the preprocessing modules), and the calculation module 211 may be used directly to process the input data.

在执行设备210对输入数据进行预处理，或者，在执行设备210的计算模块211执行计算等相关的处理过程中，执行设备210可以调用数据存储系统250中的数据、代码等以用于相应的处理，也可以将相应处理得到的数据、指令等存入数据存储系统250中。When the execution device 210 preprocesses the input data, or in the execution device 210 computing module 211 performs calculations and other related processing, the execution device 210 can call the data, codes, etc. in the data storage system 250 for the corresponding processing, and may also store the data and instructions obtained through the corresponding processing into the data storage system 250 .

最后，I/O接口212将处理结果，如上述得到待处理视频，即将得到的风格迁移处理后的视频返回给客户设备240，从而提供给用户。Finally, the I/O interface 212 returns the processing results, such as the video to be processed as described above, that is, the obtained video after the style transfer process, to the client device 240, so as to provide it to the user.

值得说明的是，训练设备220可以针对不同的目标或称不同的任务，根据不同的训练数据生成相应的目标模型/规则201，该相应的目标模型/规则201即可以用于实现上述目标或完成上述任务，从而为用户提供所需的结果。It is worth noting that the training device 220 can generate corresponding target models/rules 201 according to different training data for different goals or different tasks, and the corresponding target models/rules 201 can be used to achieve the above goals or complete above tasks, thereby providing the desired result to the user.

在图3中所示情况下，在一种情况下，用户可以手动给定输入数据，该手动给定可以通过I/O接口212提供的界面进行操作。In the situation shown in FIG. 3 , in one case, the user can manually specify the input data, and the manual specification can be operated through the interface provided by the I/O interface 212 .

另一种情况下，客户设备240可以自动地向I/O接口212发送输入数据，如果要求客户设备240自动发送输入数据需要获得用户的授权，则用户可以在客户设备240中设置相应权限。用户可以在客户设备240查看执行设备210输出的结果，具体的呈现形式可以是显示、声音、动作等具体方式。客户设备240也可以作为数据采集端，采集如图所示输入I/O接口212的输入数据及输出I/O接口212的输出结果作为新的样本数据，并存入数据库230。当然，也可以不经过客户设备240进行采集，而是由I/O接口212直接将如图所示输入I/O接口212的输入数据及输出I/O接口212的输出结果，作为新的样本数据存入数据库230。In another case, the client device 240 can automatically send the input data to the I/O interface 212 . If the client device 240 is required to automatically send the input data to obtain the user's authorization, the user can set the corresponding authority in the client device 240 . The user can view the results output by the execution device 210 on the client device 240, and the specific presentation form may be specific ways such as display, sound, and action. The client device 240 can also be used as a data collection terminal, collecting input data from the input I/O interface 212 and output results from the output I/O interface 212 as new sample data, and storing them in the database 230 . Of course, the client device 240 may not be used for collection, but the I/O interface 212 directly uses the input data input to the I/O interface 212 as shown in the figure and the output result of the output I/O interface 212 as a new sample. The data is stored in database 230 .

值得注意的是，图3仅是本申请实施例提供的一种系统架构的示意图，图中所示设备、器件、模块等之间的位置关系不构成任何限制。例如，在图3中，数据存储系统250相对执行设备210是外部存储器，在其它情况下，也可以将数据存储系统250置于执行设备210中。It should be noted that FIG. 3 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 3 , the data storage system 250 is an external memory relative to the execution device 210 , and in other cases, the data storage system 250 may also be placed in the execution device 210 .

如图3所示，根据训练设备220训练得到目标模型/规则201，该目标模型/规则201在本申请实施例中可以是目标风格迁移模型，具体的，本申请实施例提供的目标风格迁移模型可以是深度神经网络，卷积神经网络，或者，可以是深度卷积神经网络等。As shown in FIG. 3 , the target model/rule 201 is trained according to the training device 220. The target model/rule 201 may be the target style transfer model in the embodiment of the present application. Specifically, the target style transfer model provided in the embodiment of the present application It can be a deep neural network, a convolutional neural network, or a deep convolutional neural network, etc.

下面结合图4重点对卷积神经网络的结构进行详细的介绍。如上文的基础概念介绍所述，卷积神经网络是一种带有卷积结构的深度神经网络，是一种深度学习(deeplearning)架构，深度学习架构是指通过机器学习的算法，在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构，卷积神经网络是一种前馈(feed-forward)人工神经网络，该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。The following is a detailed introduction to the structure of the convolutional neural network in combination with Figure 4. As mentioned in the introduction to the basic concepts above, a convolutional neural network is a deep neural network with a convolutional structure and a deep learning architecture. Multiple levels of learning are performed at the abstraction level. As a deep learning architecture, a convolutional neural network is a feed-forward artificial neural network in which individual neurons can respond to images fed into it.

本申请实施例中风格迁移模型的网络结构可以如图4所示。在图4中，卷积神经网络300可以包括输入层310，卷积层/池化层320(其中，池化层为可选的)，以及神经网络层330。其中，输入层310可以获取待处理图像，并将获取到的待处理图像交由卷积层/池化层320以及后面的神经网络层330进行处理，可以得到图像的处理结果。下面对图4中的CNN300中内部的层结构进行详细的介绍。The network structure of the style transfer model in the embodiment of the present application may be shown in FIG. 4 . In FIG. 4 , a convolutional neural network 300 may include an input layer 310 , a convolutional layer/pooling layer 320 (where the pooling layer is optional), and a neural network layer 330 . Wherein, the input layer 310 can obtain the image to be processed, and pass the obtained image to be processed by the convolution layer/pooling layer 320 and the subsequent neural network layer 330 to obtain the processing result of the image. The following is a detailed introduction to the internal layer structure of CNN300 in FIG. 4 .

卷积层/池化层320：Convolutional/pooling layer 320:

如图4所示卷积层/池化层320可以包括如示例321-326层；举例来说：在一种实现中，321层为卷积层，322层为池化层，323层为卷积层，324层为池化层，325为卷积层，326为池化层；在另一种实现方式中，321、322为卷积层，323为池化层，324、325为卷积层，326为池化层，即卷积层的输出可以作为随后的池化层的输入，也可以作为另一个卷积层的输入以继续进行卷积操作。As shown in Figure 4, the convolutional layer/pooling layer 320 may include layers 321-326 as examples; for example: in one implementation, layer 321 is a convolutional layer, layer 322 is a pooling layer, and layer 323 is a volume Layer, 324 is a pooling layer, 325 is a convolutional layer, and 326 is a pooling layer; in another implementation, 321, 322 are convolutional layers, 323 are pooling layers, 324, 325 are convolutional layers Layer 326 is a pooling layer, that is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or can be used as the input of another convolutional layer to continue the convolution operation.

下面将以卷积层321为例，介绍一层卷积层的内部工作原理。The following will take the convolutional layer 321 as an example to introduce the inner working principle of one convolutional layer.

卷积层321可以包括很多个卷积算子，卷积算子也称为核，其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器，卷积算子本质上可以是一个权重矩阵，这个权重矩阵通常被预先定义，在对图像进行卷积操作的过程中，权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素等，这取决于步长stride的取值)的进行处理，从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关，需要注意的是，权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的，在进行卷积运算的过程中，权重矩阵会延伸到输入图像的整个深度。因此，和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出，但是大多数情况下不使用单一权重矩阵，而是应用多个尺寸(行×列)相同的权重矩阵，即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度，这里的维度可以理解为由上面所述的“多个”来决定。The convolution layer 321 can include many convolution operators, which are also called kernels, and their role in image processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. During the convolution operation on the image, the weight matrix is usually one pixel by one pixel (or two pixels by two pixels) along the horizontal direction on the input image. etc., depending on the value of the stride), so as to complete the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image. During the convolution operation, the weight matrix will be extended to The entire depth of the input image. Therefore, convolution with a single weight matrix will produce a convolutional output with a single depth dimension, but in most cases instead of using a single weight matrix, multiple weight matrices of the same size (row×column) are applied, That is, multiple matrices of the same shape. The output of each weight matrix is stacked to form the depth dimension of the convolution image, where the dimension can be understood as determined by the "multiple" mentioned above.

不同的权重矩阵可以用来提取图像中不同的特征，例如，一个权重矩阵用来提取图像边缘信息，另一个权重矩阵用来提取图像的特定颜色，又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同，经过该多个尺寸相同的权重矩阵提取后的卷积特征图的尺寸也相同，再将提取到的多个尺寸相同的卷积特征图合并形成卷积运算的输出。Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted features in the image. Noise blurring etc. The multiple weight matrices have the same size (row×column), and the convolutional feature maps extracted by the multiple weight matrices of the same size are also of the same size, and then the extracted multiple convolutional feature maps of the same size are combined to form The output of the convolution operation.

这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到，通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息，从而使得卷积神经网络300进行正确的预测。The weight values in these weight matrices need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 300 can make correct predictions .

当卷积神经网络300有多个卷积层的时候，初始的卷积层(例如321)往往提取较多的一般特征，一般特征也可以称之为低级别的特征；随着卷积神经网络300深度的加深，越往后的卷积层(例如326)提取到的特征越来越复杂，比如，高级别的语义之类的特征，语义越高的特征越适用于待解决的问题。When the convolutional neural network 300 has multiple convolutional layers, the initial convolutional layer (for example, 321) often extracts more general features, which can also be referred to as low-level features; as the convolutional neural network With the deepening of 300 depth, the features extracted by the later convolutional layers (such as 326) become more and more complex, for example, features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.

由于常常需要减少训练参数的数量，因此卷积层之后常常需要周期性的引入池化层，在如图4中320所示例的321-326各层，可以是一层卷积层后面跟一层池化层，也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中，池化层的目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子，以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外，就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样，池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸，池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce a pooling layer after the convolutional layer. In the layers 321-326 as shown in 320 in Figure 4, it can be a convolutional layer followed by a layer The pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers. In the image processing process, the purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling an input image to obtain an image of a smaller size. The average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling. The maximum pooling operator can take the pixel with the largest value within a specific range as the result of maximum pooling. Also, just like the size of the weight matrix used in the convolutional layer should be related to the size of the image, the operators in the pooling layer should also be related to the size of the image. The size of the image output after being processed by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.

神经网络层330：Neural Network Layer 330:

在经过卷积层/池化层320的处理后，卷积神经网络300还不足以输出所需要的输出信息。因为如前所述，卷积层/池化层320只会提取特征，并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息)，卷积神经网络300需要利用神经网络层330来生成一个或者一组所需要的类的数量的输出。因此，在神经网络层330中可以包括多层隐含层(如图4所示的331、332至33n)以及输出层340，该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到，例如该任务类型可以包括图像识别，图像分类，图像检测以及图像超分辨率重建等等。After being processed by the convolutional layer/pooling layer 320, the convolutional neural network 300 is not enough to output the required output information. Because as mentioned earlier, the convolutional layer/pooling layer 320 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other relevant information), the convolutional neural network 300 needs to use the neural network layer 330 to generate one or a group of outputs with the required number of classes. Therefore, the neural network layer 330 may include multiple hidden layers (331, 332 to 33n as shown in FIG. 4 ) and an output layer 340, and the parameters contained in the multi-layer hidden layers may be based on specific task types. Relevant training data are pre-trained. For example, the task type can include image recognition, image classification, image detection, and image super-resolution reconstruction.

在神经网络层330中的多层隐含层之后，也就是整个卷积神经网络300的最后层为输出层340，该输出层340具有类似分类交叉熵的损失函数，具体用于计算预测误差，一旦整个卷积神经网络300的前向传播(如图4由310至340方向的传播为前向传播)完成，反向传播(如图4由340至310方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差，以减少卷积神经网络300的损失，及卷积神经网络300通过输出层输出的结果和理想结果之间的误差。After the multi-layer hidden layer in the neural network layer 330, that is, the last layer of the entire convolutional neural network 300 is the output layer 340, which has a loss function similar to the classification cross-entropy, and is specifically used to calculate the prediction error, Once the forward propagation of the entire convolutional neural network 300 (as shown in Figure 4, the propagation from 310 to 340 is forward propagation), the reverse propagation (as shown in Figure 4, the propagation from 340 to 310 is backward propagation) will Start to update the weight values and biases of the aforementioned layers to reduce the loss of the convolutional neural network 300 and the error between the output of the convolutional neural network 300 through the output layer and the ideal result.

需要说明的是，图4所示的卷积神经网络仅作为一种本申请实施例的目标风格迁移模型的结构示例，在具体的应用中，本申请实施例的视频风格迁移的方法所采用的风格迁移模型还可以以其他网络模型的形式存在。It should be noted that the convolutional neural network shown in FIG. 4 is only an example of the structure of the target style transfer model of the embodiment of the present application. In a specific application, the video style transfer method of the embodiment of the present application adopts Style transfer models can also exist in the form of other network models.

图5为本申请实施例提供的一种芯片的硬件结构，该芯片包括神经网络处理器400(neural-network processing unit，NPU)。该芯片可以被设置在如图3所示的执行设备210中，用以完成计算模块211的计算工作。该芯片也可以被设置在如图3所示的训练设备220中，用以完成训练设备220的训练工作并输出目标模型/规则201。如图4所示的卷积神经网络中各层的算法均可在如图5所示的芯片中得以实现。FIG. 5 is a hardware structure of a chip provided by an embodiment of the present application, and the chip includes a neural network processor 400 (neural-network processing unit, NPU). The chip can be set in the execution device 210 shown in FIG. 3 to complete the computing work of the computing module 211 . The chip can also be set in the training device 220 shown in FIG. 3 to complete the training work of the training device 220 and output the target model/rule 201 . The algorithms of each layer in the convolutional neural network shown in FIG. 4 can be implemented in the chip shown in FIG. 5 .

NPU 400作为协处理器挂载到主中央处理器(central processing unit，CPU)上，由主CPU分配任务。NPU 400的核心部分为运算电路403，控制器404控制运算电路403提取存储器(权重存储器或输入存储器)中的数据并进行运算。The NPU 400 is mounted on a main central processing unit (central processing unit, CPU) as a coprocessor, and the main CPU assigns tasks. The core part of the NPU 400 is the operation circuit 403, and the controller 404 controls the operation circuit 403 to extract data in the memory (weight memory or input memory) and perform operations.

在一些实现中，运算电路403内部包括多个处理单元(process engine,PE)。在一些实现中，运算电路403是二维脉动阵列。运算电路403还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中，运算电路403是通用的矩阵处理器。In some implementations, the operation circuit 403 includes multiple processing units (process engine, PE). In some implementations, arithmetic circuit 403 is a two-dimensional systolic array. The arithmetic circuit 403 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 403 is a general-purpose matrix processor.

举例来说，假设有输入矩阵A，权重矩阵B，输出矩阵C。运算电路403从权重存储器402中取矩阵B相应的数据，并缓存在运算电路403中每一个PE上。运算电路403从输入存储器401中取矩阵A数据与矩阵B进行矩阵运算，得到的矩阵的部分结果或最终结果，保存在累加器408(accumulator)中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The operation circuit 403 fetches the data corresponding to the matrix B from the weight memory 402 and caches it in each PE in the operation circuit 403 . The operation circuit 403 takes the data of matrix A from the input memory 401 and performs matrix operation with matrix B, and the obtained partial or final results of the matrix are stored in the accumulator 408 (accumulator).

向量计算单元407可以对运算电路403的输出做进一步处理，如向量乘，向量加，指数运算，对数运算，大小比较等等。例如，向量计算单元407可以用于神经网络中非卷积/非FC层的网络计算，如池化(pooling)，批归一化(batch normalization)，局部响应归一化(local response normalization)等。The vector calculation unit 407 can perform further processing on the output of the operation circuit 403, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on. For example, the vector calculation unit 407 can be used for network calculations of non-convolution/non-FC layers in neural networks, such as pooling (pooling), batch normalization (batch normalization), local response normalization (local response normalization), etc. .

在一些实现种，向量计算单元能407将经处理的输出的向量存储到统一存储器406。例如，向量计算单元407可以将非线性函数应用到运算电路403的输出，例如累加值的向量，用以生成激活值。在一些实现中，向量计算单元407生成归一化的值、合并值，或二者均有。In some implementations, the vector computation unit can 407 store the vector of the processed output to the unified memory 406 . For example, the vector calculation unit 407 may apply a non-linear function to the output of the operation circuit 403, such as a vector of accumulated values, to generate activation values. In some implementations, the vector computation unit 407 generates normalized values, merged values, or both.

在一些实现中，处理过的输出的向量能够用作到运算电路403的激活输入，例如用于在神经网络中的后续层中的使用。In some implementations, the vector of processed outputs can be used as an activation input to operational circuitry 403, eg, for use in subsequent layers in a neural network.

统一存储器406用于存放输入数据以及输出数据。The unified memory 406 is used to store input data and output data.

权重数据直接通过存储单元访问控制器405(direct memory accesscontroller，DMAC)将外部存储器中的输入数据存入到输入存储器401和/或统一存储器406、将外部存储器中的权重数据存入权重存储器402，以及将统一存储器406中的数据存入外部存储器。The weight data directly stores the input data in the external memory into the input memory 401 and/or the unified memory 406 through the storage unit access controller 405 (direct memory access controller, DMAC), and stores the weight data in the external memory into the weight memory 402, And store the data in the unified memory 406 into the external memory.

总线接口单元410(bus interface unit，BIU)，用于通过总线实现主CPU、DMAC和取指存储器409之间进行交互。A bus interface unit 410 (bus interface unit, BIU) is configured to realize the interaction between the main CPU, DMAC and instruction fetch memory 409 through the bus.

与控制器404连接的取指存储器409(instruction fetch buffer)，用于存储控制器404使用的指令。An instruction fetch buffer 409 (instruction fetch buffer) connected to the controller 404 is used to store instructions used by the controller 404 .

控制器404，用于调用取指存储器409中缓存的指令，实现控制该运算加速器的工作过程。The controller 404 is configured to call the instruction cached in the instruction fetch memory 409 to control the operation process of the operation accelerator.

一般地，统一存储器406，输入存储器401，权重存储器402以及取指存储器409均为片上(On-Chip)存储器，外部存储器为该NPU外部的存储器，该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random accessmemory，DDR SDRAM)、高带宽存储器(high bandwidth memory，HBM)或其他可读可写的存储器。Generally, the unified memory 406, the input memory 401, the weight memory 402, and the instruction fetch memory 409 are all on-chip (On-Chip) memories, and the external memory is a memory outside the NPU, and the external memory can be a double data rate synchronous dynamic random Memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM) or other readable and writable memory.

其中，图4所示的卷积神经网络中各层的运算可以由运算电路403或向量计算单元407执行。Wherein, the operations of each layer in the convolutional neural network shown in FIG. 4 can be performed by the operation circuit 403 or the vector calculation unit 407 .

上文中介绍的图3中的执行设备210能够执行本申请实施例的视频风格迁移的方法的各个步骤，图4所示的CNN模型和图5所示的芯片也可以用于执行本申请实施例的视频风格迁移的方法的各个步骤。The executing device 210 in FIG. 3 described above can execute each step of the video style transfer method of the embodiment of the present application. The CNN model shown in FIG. 4 and the chip shown in FIG. 5 can also be used to execute the embodiment of the present application The various steps of the video style transfer method.

图6所示是本申请实施例提供了一种系统架构500。该系统架构包括本地设备520、本地设备530以及执行设备510和数据存储系统550，其中，本地设备520和本地设备530通过通信网络与执行设备510连接。FIG. 6 shows a system architecture 500 provided by the embodiment of the present application. The system architecture includes a local device 520, a local device 530, an execution device 510, and a data storage system 550, wherein the local device 520 and the local device 530 are connected to the execution device 510 through a communication network.

执行设备510可以由一个或多个服务器实现。可选的，执行设备510可以与其它计算设备配合使用，例如：数据存储器、路由器、负载均衡器等设备。执行设备510可以布置在一个物理站点上，或者分布在多个物理站点上。执行设备510可以使用数据存储系统550中的数据，或者调用数据存储系统550中的程序代码来实现本申请实施例的视频风格迁移的方法。Execution device 510 may be implemented by one or more servers. Optionally, the execution device 510 may be used in cooperation with other computing devices, such as data storage, routers, load balancers and other devices. Execution device 510 may be arranged on one physical site, or distributed on multiple physical sites. The execution device 510 may use the data in the data storage system 550, or call the program code in the data storage system 550 to implement the method for video style transfer in the embodiment of the present application.

需要说明的是，上述执行设备510也可以称为云端设备，此时执行设备510可以部署在云端。It should be noted that the execution device 510 may also be called a cloud device, and in this case, the execution device 510 may be deployed in the cloud.

具体地，执行设备510可以执行以下过程：Specifically, the execution device 510 may perform the following process:

获取训练数据，其中，所述训练数据包括N帧样本内容图像、样本风格图像以及N帧合成图像，所述N帧合成图像是根据所述样本风格图像对所述N帧样本内容图像进行图像风格迁移处理后得到的图像，N为大于或者等于2的整数；通过神经网络模型根据所述样本风格图像对所述N帧样本内容图像进行图像风格迁移处理，得到N帧预测合成图像；根据所述N帧样本内容图像与所述N帧预测合成图像之间的图像损失函数，确定所述神经网络模型的参数，其中，所述图像损失函数包括低秩损失函数，所述低秩损失函数用于表示第一低秩矩阵与第二低秩矩阵之间的差异，所述第一低秩矩阵是基于所述N帧样本内容图像与光流信息得到的，所述第二低秩矩阵是基于所述N帧预测合成图像与所述光流信息得到的，所述光流信息用于表示所述N帧样本内容图像中相邻两帧图像之间对应像素点的位置差异。Obtain training data, wherein the training data includes N frames of sample content images, sample style images, and N frames of composite images, and the N frames of composite images are image styled on the N frames of sample content images according to the sample style images. For the image obtained after migration processing, N is an integer greater than or equal to 2; performing image style migration processing on the N frames of sample content images according to the sample style image through a neural network model to obtain N frames of predicted composite images; according to the An image loss function between the N frames of sample content images and the N frames of predicted composite images to determine the parameters of the neural network model, wherein the image loss function includes a low-rank loss function, and the low-rank loss function is used for Indicates the difference between the first low-rank matrix and the second low-rank matrix, the first low-rank matrix is obtained based on the N frames of sample content images and optical flow information, and the second low-rank matrix is obtained based on the The N frames of predicted composite images and the optical flow information are used to represent the position differences of corresponding pixels between two adjacent frames of images in the N frames of sample content images.

或者，执行设备510可以执行以下过程：Alternatively, the execution device 510 may perform the following process:

获取待处理视频，其中，所述待处理视频包括N帧待处理内容图像，N为大于或者等于2的整数；根据目标风格迁移模型对所述N帧待处理内容图像进行图像风格迁移处理，得到N帧合成图像；根据所述N帧合成图像，得到所述待处理视频对应的风格迁移处理后的视频，其中，所述目标风格迁移模型的参数是根据所述目标风格迁移模型对N帧样本内容图像进行风格迁移处理的图像损失函数确定的，所述图像损失函数包括低秩损失函数，所述低秩损失函数用于表示第一低秩矩阵与第二低秩矩阵之间的差异，所述第一低秩矩阵是基于所述N帧样本内容图像与光流信息得到的，所述第二低秩矩阵是基于N帧预测合成图像与所述光流信息得到的，所述光流信息用于表示所述N帧样本内容图像中相邻两帧图像之间对应像素点的位置差异，所述N帧预测合成图像是指通过所述目标风格迁移模型根据样本风格图像对所述N帧样本内容图像进行图像风格迁移处理后得到的图像。Obtain a video to be processed, wherein the video to be processed includes N frames of content images to be processed, and N is an integer greater than or equal to 2; perform image style transfer processing on the N frames of content images to be processed according to the target style transfer model, and obtain N-frame composite image; according to the N-frame composite image, obtain the style-transfer-processed video corresponding to the video to be processed, wherein the parameters of the target style-transfer model are N-frame samples according to the target style-transfer model The content image is determined by an image loss function for style transfer processing, the image loss function includes a low-rank loss function, and the low-rank loss function is used to represent the difference between the first low-rank matrix and the second low-rank matrix, so The first low-rank matrix is obtained based on the N frames of sample content images and optical flow information, the second low-rank matrix is obtained based on N frames of predicted composite images and the optical flow information, and the optical flow information It is used to indicate the position difference of corresponding pixels between two adjacent frames of images in the sample content image of the N frames, and the predicted composite image of the N frames means that the target style transfer model is used to transform the N frames according to the sample style image The sample content image is the image obtained after image style transfer processing.

用户可以操作各自的用户设备(例如，本地设备520和本地设备530)与执行设备510进行交互。每个本地设备可以表示任何计算设备，例如，个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。Users can operate respective user devices (for example, local device 520 and local device 530 ) to interact with execution device 510 . Each local device can represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, game console, etc.

每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备510进行交互，通信网络可以是广域网、局域网、点对点连接等方式，或它们的任意组合。Each user's local device can interact with the execution device 510 through any communication mechanism/communication standard communication network, and the communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.

在一种实现方式中，本地设备520、本地设备530可以从执行设备510获取到目标风格迁移模型的相关参数，将目标风格迁移模型部署在本地设备520、本地设备530上，利用该目标风格迁移模型进行视频风格迁移处理等。In one implementation, the local device 520 and the local device 530 can obtain relevant parameters of the target style transfer model from the execution device 510, deploy the target style transfer model on the local device 520 and the local device 530, and use the target style transfer model to The model performs video style transfer processing, etc.

在另一种实现中，执行设备510上可以直接部署目标风格迁移模型，执行设备510通过从本地设备520和本地设备530获取待处理视频，并根据目标风格迁移模型对待处理视频进行风格迁移处理等。In another implementation, the target style transfer model can be directly deployed on the execution device 510, and the execution device 510 obtains the video to be processed from the local device 520 and the local device 530, and performs style transfer processing on the video to be processed according to the target style transfer model, etc. .

目前，利用光流信息来稳定风格化视频的视频风格迁移模型主要包括两种，第一种为将光流用于风格迁移模型的训练过程中，但是在测试阶段并不引入光流信息；第二种为将光流模块融入风格迁移模型的结构中；但是，对于第一种方法能够保证了在测试阶段风格迁移模型的运算效率，但得到的风格迁移处理后的视频的稳定性差；第二种方法能够保证风格迁移处理后输出视频的稳定性，但是由于引入光流模块故在测试阶段需要计算视频中包括的图像帧与图像帧之间的光流信息，因此无法风格迁移模型在测试阶段的运算效率。At present, there are mainly two types of video style transfer models that use optical flow information to stabilize stylized videos. The first one uses optical flow in the training process of the style transfer model, but does not introduce optical flow information in the testing phase; the second One is to integrate the optical flow module into the structure of the style transfer model; however, the first method can guarantee the operation efficiency of the style transfer model in the test phase, but the stability of the video after the style transfer process is poor; the second method The method can ensure the stability of the output video after style transfer processing, but due to the introduction of the optical flow module, it is necessary to calculate the optical flow information between the image frames included in the video and the image frames during the test phase, so the style transfer model cannot be used in the test phase. Operational efficiency.

有鉴于此，本申请实施例提出了一种风格迁移模型的训练方法以及视频风格迁移的方法，在训练用于视频的风格迁移模型的过程中引入了低秩损失函数，通过低秩信息的学习，能够同步风格迁移后的视频与原始视频的稳定性，从而能够提高目标迁移模型得到的风格迁移处理后视频的稳定性；此外，本申请实施例提出的风格迁移模型在测试阶段即对待处理视频进行风格迁移处理的过程中不需要计算视频中包括的多帧图像之间的光流信息，因此本申请实施例提供的目标迁风格移模型在提高稳定性的同时还能缩短模型的风格迁移处理的时间，提升目标风格迁移模型的运行效率。In view of this, the embodiment of the present application proposes a method for training a style transfer model and a method for video style transfer. In the process of training a style transfer model for videos, a low-rank loss function is introduced. Through the learning of low-rank information , can synchronize the stability of the style-transferred video and the original video, thereby improving the stability of the style-transferred video obtained by the target transfer model; in addition, the style transfer model proposed in the embodiment of the present application is the video to be processed in the testing phase In the process of style transfer processing, it is not necessary to calculate the optical flow information between multiple frames of images included in the video, so the target transfer style transfer model provided by the embodiment of the present application can shorten the style transfer processing of the model while improving stability. Time to improve the operating efficiency of the target style transfer model.

图7示出了本申请实施例提供的风格迁移模型的训练方法600的示意性流程图，该方法可以由能够进行图像风格迁移的装置执行；例如，该训练方法可以由图6中的执行设备510执行，或者，也可以由本地设备520执行。其中，训练方法600包括S610至S630，下面分别对这些步骤进行详细的描述。FIG. 7 shows a schematic flow chart of a method 600 for training a style transfer model provided by an embodiment of the present application. The method can be executed by a device capable of image style transfer; for example, the training method can be performed by the execution device in FIG. 6 510, or may also be executed by the local device 520. Wherein, the training method 600 includes S610 to S630, and these steps will be described in detail below respectively.

S610、获取训练数据。S610. Obtain training data.

其中，训练数据可以包括N帧样本内容图像、样本风格图像以及N帧合成图像，N帧合成图像是根据样本风格图像对N帧样本内容图像进行图像风格迁移处理后得到的图像，N为大于或者等于2的整数。Wherein, the training data may include N frames of sample content images, sample style images, and N frames of composite images. The N frames of composite images are images obtained after performing image style transfer processing on N frames of sample content images according to the sample style images, and N is greater than or An integer equal to 2.

示例性地，N帧样本内容图像可以是指样本视频中包括的N帧连续的样本内容图像；N帧合成图像可以是指根据样本风格图像对样本视频进行风格迁移处理后得到的视频中包括的N帧连续的合成图像。Exemplarily, the N frames of sample content images may refer to N frames of continuous sample content images included in the sample video; the N frames of composite images may refer to the samples included in the video obtained after performing style transfer processing on the sample video according to the sample style image N consecutive composite images.

应理解，对于单帧图像的风格迁移处理即图像风格迁移而言只需要考虑内容图像中的内容以及风格图像中的风格；但是对于视频而言，由于视频中包括多帧连续的视频，视频的风格迁移不仅要考虑图像的风格化效果，还要考虑多帧图像之间的稳定性；即需要确保风格迁移处理后视频的流畅性，避免出现闪屏、伪影等噪声。It should be understood that for the style transfer processing of a single-frame image, that is, image style transfer, only the content in the content image and the style in the style image need to be considered; Style transfer should not only consider the stylized effect of the image, but also consider the stability between multiple frames of images; that is, it is necessary to ensure the smoothness of the video after the style transfer process and avoid noise such as splash screens and artifacts.

需要说明的是，上述N帧样本内容图像是指视频中的N帧相邻的图像；N帧合成图像是指与N帧样本内容图像对应的图像。It should be noted that the above N frames of sample content images refer to N frames of adjacent images in the video; N frames of composite images refer to images corresponding to N frames of sample content images.

S620、通过神经网络模型根据样本风格图像对N帧样本内容图像进行图像风格迁移处理，得到N帧预测合成图像。S620. Perform image style transfer processing on N frames of sample content images according to the sample style images through the neural network model, to obtain N frames of predicted composite images.

示例性地，可以将样本视频中包括的N帧样本内容图像以及N帧样本风格图像输入至神经网络模型。Exemplarily, N frames of sample content images and N frames of sample style images included in the sample video may be input to the neural network model.

例如，可以是将N帧样本内容图像分别一帧一帧的输入至神经网络模型，神经网络模型可以根据样本风格图像对该一帧样本内容图像进行图像风格迁移处理，从而得到该一帧样本内容图像对应的一帧预测合成图像；执行N次上述过程后，可以得到N帧样本内容图像对应的N帧预测合成图像。For example, N frames of sample content images may be input to the neural network model frame by frame, and the neural network model may perform image style transfer processing on a frame of sample content images according to the sample style image, thereby obtaining the frame of sample content One frame of predicted composite images corresponding to the images; after performing the above process N times, N frames of predicted composite images corresponding to N frames of sample content images can be obtained.

例如，可以是将N帧样本内容图像中的多帧图像一次的输入至神经网络模型，神经网络模型可以根据样本风格图像对该多样本内容图像进行图像风格迁移处理。For example, multiple frames of N frames of sample content images may be input to the neural network model once, and the neural network model may perform image style transfer processing on the multiple sample content images according to the sample style image.

S630、根据N帧样本内容图像与N帧预测合成图像之间的图像损失函数，确定神经网络模型的参数。S630. Determine the parameters of the neural network model according to the image loss function between the N frames of sample content images and the N frames of predicted composite images.

其中，所述图像损失函数包括低秩损失函数，所述低秩损失函数用于表示第一低秩矩阵与第二低秩矩阵之间的差异，所述第一低秩矩阵是基于N帧样本内容图像与光流信息得到的，所述第二低秩矩阵是基于N帧预测合成图像与所述光流信息得到的，所述光流信息用于表示所述N帧样本内容图像中相邻两帧图像之间对应像素点的位置差异。Wherein, the image loss function includes a low-rank loss function, and the low-rank loss function is used to represent the difference between the first low-rank matrix and the second low-rank matrix, and the first low-rank matrix is based on N frame samples The content image and optical flow information are obtained, the second low-rank matrix is obtained based on N frames of predicted composite images and the optical flow information, and the optical flow information is used to represent the adjacent The position difference of corresponding pixel points between two frames of images.

需要说明的是，对于多帧图像构成的矩阵中，低秩矩阵可以用于表示N帧图像中都出现且不是运动边界的区域。稀疏矩阵可以用于表示N帧图像中间歇性出现的区域；例如，稀疏矩阵可以是指由于摄像机移动而在图像边界新出现或消失的区域，或者，移动物体的边界区域。It should be noted that, for a matrix composed of multiple frames of images, a low-rank matrix can be used to represent regions that appear in all N frames of images and are not motion boundaries. A sparse matrix can be used to represent regions that appear intermittently in N frames of images; for example, a sparse matrix can refer to regions that newly appear or disappear at image borders due to camera movement, or border regions of moving objects.

例如，N帧样本内容图像可以是指用户在运动的图像，则N帧样本内容图像构成的低秩矩阵可以用于表示N帧样本内容图像中都出现且不是运动边界的区域；比如，N帧样本内容构成的低秩矩阵可以用于表示未发生运动的背景区域，或者，N帧样本内容图像中均出现的用户且不是运动边界的区域。For example, N frames of sample content images can refer to images in which the user is moving, and then the low-rank matrix formed by N frames of sample content images can be used to represent areas that appear in N frames of sample content images and are not motion boundaries; for example, N frames The low-rank matrix formed by the sample content can be used to represent the background area where no motion occurs, or the user appears in all the N frames of sample content images and is not a motion boundary area.

示例性地，假设视频中包括连续的5帧图像，即连续的5帧样本内容图像；得到5帧样本内容图像进行风格迁移处理的5帧风格迁移结果即5帧样本合成图像；低秩矩阵可以采用如下方式进行计算：Exemplarily, it is assumed that the video includes 5 consecutive frames of images, that is, 5 consecutive frames of sample content images; 5 frames of style transfer results obtained from 5 frames of sample content images for style transfer processing are 5 frames of sample composite images; the low-rank matrix can be Calculate as follows:

步骤1、根据视频中的5帧图像，即5帧样本内容图像计算图像帧与图像帧之间的光流信息；Step 1. Calculate optical flow information between image frames and image frames according to 5 frames of images in the video, that is, 5 frames of sample content images;

步骤2、根据光流信息计算掩码信息，其中，掩码信息可以用于表示通过根据光流信息得到的连续两帧图像中的变化区域；Step 2. Calculate the mask information according to the optical flow information, wherein the mask information can be used to represent the change area in two consecutive frames of images obtained according to the optical flow information;

步骤3、根据光流信息和掩码信息计算将第1、2、4、5帧图像与第3帧图像对齐之后的低秩部分，并将稀疏部分置为0；Step 3. Calculate the low-rank part after aligning the 1st, 2nd, 4th, and 5th frame images with the 3rd frame image according to the optical flow information and mask information, and set the sparse part to 0;

步骤4、将对齐后的5帧图像分别展成向量并按列组合成一个矩阵，则该矩阵可以是低秩矩阵。Step 4. Expand the aligned 5 frames of images into vectors and combine them into a matrix by column, then the matrix can be a low-rank matrix.

应理解，在计算低秩损失函数时，目标是5帧样本内容图像构成的图像矩阵的低秩部分的秩与5帧样本合成图像构成的图像矩阵的低秩部分的秩逼近；其中，通过优化核范数可以不断优化低秩部分的秩，核范数是通过对矩阵进行奇异值分解得到的。It should be understood that when calculating the low-rank loss function, the goal is to approximate the rank of the low-rank part of the image matrix composed of 5 frames of sample content images to the rank of the low-rank part of the image matrix composed of 5 frames of sample synthetic images; wherein, by optimizing The nuclear norm can continuously optimize the rank of the low-rank part, and the nuclear norm is obtained by performing singular value decomposition on the matrix.

示例性地，对于连续的K帧图像和其对应的光流信息(例如，可以包括前向光流信息和反向光流信息)/>学生模型输出的合成图像/> Exemplarily, for continuous K frames of images And its corresponding optical flow information (for example, may include forward optical flow information and reverse optical flow information) /> Composite image of student model output />

首先，可以根据前向光流信息、后向光流信息以及掩码信息将K帧合成图像映射到固定一帧，一般来说为τ＝[K/2]帧；即对于第t帧合成图像，在将其映射到第τ帧合成图像后，其低秩矩阵可以表示为：First, K frames of synthetic images can be mapped to a fixed frame according to the forward optical flow information, backward optical flow information and mask information, generally speaking, τ=[K/2] frames; that is, for the tth frame synthetic image , after mapping it to the synthesized image at frame τ, its low-rank matrix can be expressed as:

R_t＝M_t-τ⊙W[N_s(x_t),f_t-τ]；R _t ＝M _t-τ ⊙W[N _s (x _t ), f _t-τ ];

其中，M_t-τ用于表示K帧图像的前向光流信息与反向光流信息计算得到的掩码信息；W用于表示映射操作(warp)。Among them, M _t-τ is used to represent the mask information calculated from the forward optical flow information and reverse optical flow information of K frames of images; W is used to represent the mapping operation (warp).

根据步骤4可以得到向量化后并按列组合得到的矩阵X，X＝[vec(R₀),...,vec(R_K)]^T∈R^K*L，其中，L＝H*W*3，K用于表示矩阵X的行数，是图像的帧数；L用于表示矩阵X的列数；H用于表示每帧图像的高；W用于表示每帧图像的宽。According to step 4, the matrix X obtained after vectorization and column combination can be obtained, X=[vec(R ₀ ),...,vec(R _K )] ^T ∈ ^{R K*L} , where L=H*W *3, K is used to indicate the number of rows of matrix X, which is the number of frames of the image; L is used to indicate the number of columns of matrix X; H is used to indicate the height of each frame of image; W is used to indicate the width of each frame of image.

对X进行奇异值分解，希望能得到核范数，分解过程为X＝u∑v^T，其中，矩阵X的大小为K*L，u∈R^K*K,v∈R^L*L，而核范数||X||_*＝tr(∑)。tr用于表示矩阵的迹；比如，对于一个n×n矩阵A的主对角线(从左上方至右下方的对角线)上各个元素的总和被称为矩阵A的迹，tr(A)。Perform singular value decomposition on X, hoping to get the nuclear norm, the decomposition process is X=u∑v ^T , where the size of the matrix X is K*L, u∈R ^K*K , v∈R ^L*L , and Kernel norm ||X|| _* ＝tr(∑). tr is used to represent the trace of the matrix; for example, for an n×n matrix A, the sum of the elements on the main diagonal (diagonal from the upper left to the lower right) is called the trace of the matrix A, tr(A ).

低秩损失函数为：The low-rank loss function is:

L＝(||X_input||_*-||X_s||_*)²；L＝(||X _input || _* -||X _s || _* ) ² ;

其中，X_input表示输入的K帧图像得到的向量化矩阵；X_s表示根据目标学生网络输出的K帧合成图像得到的向量化矩阵。Among them, X _input represents the vectorized matrix obtained from the input K frames of images; X _s represents the vectorized matrix obtained from the synthesized images of K frames output by the target student network.

在本申请的实施例中在训练风格迁移模型时引入低秩损失函数的目标是在原始视频中相邻多帧内容图像中都出现且不是运动边界的区域，在经过风格迁移处理后仍保持相同，即使得风格迁移处理后的视频中该区域的秩逼近于原始视频该区域的秩，从而能够提高风格迁移处理后视频的稳定性。In the embodiment of the present application, the goal of introducing a low-rank loss function when training the style transfer model is to keep the same area after the style transfer process that appears in the original video in multiple adjacent frames of content images that are not motion boundaries. , that is, the rank of the region in the style-transferred video is close to the rank of the region in the original video, so that the stability of the style-transferred video can be improved.

进一步地，在本申请的实施例中，上述图像损失函数中还包括残差损失函数，残差损失函数是根据第一样本合成图像与第二样本合成图像之间的差异得到的，其中，第一样本合成图像是通过第一模型对述N帧样本内容图像进行图像风格迁移处理得到的，第二样本合成图像是通过第二模型对N帧样本内容图像进行图像风格迁移处理得到的，第一模型与第二模型是根据样本风格图像预先训练的图像风格迁移模型，第二模型包括光流模块，第一模型不包括光流模块，光流模块用于确定N帧样本内容图像的光流信息。Further, in the embodiment of the present application, the above image loss function also includes a residual loss function, and the residual loss function is obtained according to the difference between the composite image of the first sample and the composite image of the second sample, wherein, The first sample composite image is obtained by performing image style transfer processing on the N frames of sample content images by the first model, and the second sample composite image is obtained by performing image style transfer processing on the N frames of sample content images by the second model, The first model and the second model are image style transfer models pre-trained according to sample style images, the second model includes an optical flow module, the first model does not include an optical flow module, and the optical flow module is used to determine the optical flow of N frames of sample content images stream information.

需要说明的是，上述第一模型与第二模型是指通过相同的风格图像训练的风格迁移模型，其中，第一模型与第二模型的区别在于第一模型中不包括光流模块；第二模型中包括光流模块；即第一模型与第二模型在训练时可以采用相同的样本内容图像以及样本风格图像，例如，在训练阶段第一模型与第二模型可以是指相同的模型；但是，在测试阶段第二模型还需要计算多帧样本内容图像之间的光流信息；而第一模型则不需要计算多帧图像之间的光流信息。It should be noted that the first model and the second model above refer to the style transfer model trained by the same style image, wherein the difference between the first model and the second model is that the first model does not include an optical flow module; the second The model includes an optical flow module; that is, the first model and the second model can use the same sample content image and sample style image during training, for example, the first model and the second model can refer to the same model in the training phase; but , in the test phase, the second model also needs to calculate the optical flow information between multiple frames of sample content images; while the first model does not need to calculate the optical flow information between multiple frames of images.

进一步地，在本申请的实施例中，为了满足移动终端的部署需求可以采用老师-学生模型学习的策略，即训练的风格迁移模型可以是目标学生模型；在训练过程中，通过图像损失函数可以更新学生模型的参数，从而得到目标学生模型。Further, in the embodiments of the present application, in order to meet the deployment requirements of mobile terminals, a teacher-student model learning strategy can be adopted, that is, the style transfer model for training can be the target student model; during the training process, the image loss function can Update the parameters of the student model to obtain the target student model.

需要说明的是，学生模型与目标学生模型的网络结构是相同的，学生模型可以是指预先训练的在测试阶段不需要输入光流信息的风格迁移模型。It should be noted that the network structure of the student model is the same as that of the target student model, and the student model may refer to a pre-trained style transfer model that does not require input of optical flow information during the test phase.

可选地，在一种可能的实现方式中，第一模型与第二模型可以为预先训练的老师模型，目标风格迁移模型是指根据残差损失函数与知识蒸馏算法对待训练的学生模型进行训练得到的目标学生模型。Optionally, in a possible implementation, the first model and the second model can be pre-trained teacher models, and the target style transfer model refers to training the student model to be trained according to the residual loss function and the knowledge distillation algorithm Get the target student model.

应理解，知识蒸馏是指使得深度学习模型小型化、达到终端设备部署要求的关键技术。相较与量化、稀疏化等压缩技术，其不需要特定的硬件支持就能达到压缩模型的目的。知识蒸馏技术采用老师-学生模型学习的策略，其中，老师模型可以指模型参数大，一般不能满足部署需求；而学生模型参数量少，能够直接部署。通过设计有效的知识蒸馏算法，让学生模型学习模仿老师模型的行为，进行有效的知识迁移，使得学生模型最终能够表现得与老师模型相同的处理能力。It should be understood that knowledge distillation refers to the key technology to make the deep learning model miniaturized and meet the deployment requirements of terminal equipment. Compared with compression techniques such as quantization and sparsification, it does not require specific hardware support to achieve the purpose of compressing the model. The knowledge distillation technology adopts the strategy of teacher-student model learning. Among them, the teacher model can mean that the model parameters are large, which generally cannot meet the deployment requirements; while the student model has few parameters and can be deployed directly. By designing an effective knowledge distillation algorithm, the student model can learn to imitate the behavior of the teacher model, and carry out effective knowledge transfer, so that the student model can finally perform the same processing ability as the teacher model.

在本申请的实施例中，可以通过采用测试时包括光流模块的模型来对测试时无光流模块的模型进行知识蒸馏；在视频进行风格迁移中，由于老师模型与学生模型的结构不同以及训练方式不同，学生模型与老师模型的风格化效果可能会不完全相同；若让学生模型直接像素级地学习老师模型的输出信息，则学生模型的输出可能会出现重影或者模糊的现象。在本申请的实施例中，目标风格迁移模型可以是指目标学生模型，通过采用老师-学生模型学习的知识蒸馏方法使得待训练的学生模型与预先训练的基础模型输出的风格迁移结果之间的差异不断逼近包括光流模块的老师模型与不包括光流模块的老师模型输出的风格迁移结果之间的差异，通过这种训练方法可以有效避免教师模型和学生模型风格不统一所造成的重影现象。In the embodiment of the present application, knowledge distillation can be performed on the model without the optical flow module during the test by using the model including the optical flow module during the test; in the style transfer of the video, due to the different structures of the teacher model and the student model and Depending on the training method, the stylization effects of the student model and the teacher model may not be exactly the same; if the student model directly learns the output information of the teacher model at the pixel level, the output of the student model may appear ghosting or blurred. In the embodiment of the present application, the target style transfer model may refer to the target student model, by adopting the knowledge distillation method of teacher-student model learning, the relationship between the student model to be trained and the style transfer result output by the pre-trained basic model is The difference is constantly approaching the difference between the style transfer results output by the teacher model including the optical flow module and the teacher model not including the optical flow module. This training method can effectively avoid the ghosting caused by the inconsistent styles of the teacher model and the student model. Phenomenon.

可选地，在一种可能的实现方式中，所述残差损失函数是根据以下等式得到的，Optionally, in a possible implementation, the residual loss function is obtained according to the following equation,

在一个示例中，目标风格迁移模型可以是指目标学生模型，在训练目标学生模型时可以根据预先训练的第一老师模型(不包括光流模块)、预先训练的第二老师模型(包括光流模块)、预先训练的基础模型对一个待训练的学生模型进行训练，从而得到目标学生模型；其中，待训练的学生模型、预先训练的基础模型以及目标学生模型的网络结构均相同，通过上述低秩损失函数、残差损失函数以及感知损失函数对待训练的学生模型进行训练，从而得到目标学生模型。In one example, the target style transfer model may refer to the target student model, and the target student model may be trained according to the pre-trained first teacher model (excluding the optical flow module), the pre-trained second teacher model (including the optical flow module), the pre-trained basic model trains a student model to be trained to obtain the target student model; among them, the network structure of the student model to be trained, the pre-trained basic model and the target student model are all the same, through the above-mentioned low The rank loss function, residual loss function, and perceptual loss function train the student model to be trained to obtain the target student model.

在本申请的实施例中，目标风格迁移模型可以是指目标学生模型，通过采用老师-学生模型学习的知识蒸馏方法使得待训练的学生模型与预先训练的基础模型输出的风格迁移结果之间的差异不断逼近包括光流模块的老师模型与不包括光流模块的老师模型输出的风格迁移结果之间的差异，通过这种训练方法可以有效避免教师模型和学生模型风格不统一所造成的重影现象。In the embodiment of the present application, the target style transfer model may refer to the target student model, by adopting the knowledge distillation method of teacher-student model learning, the relationship between the student model to be trained and the style transfer result output by the pre-trained basic model is The difference is constantly approaching the difference between the style transfer results output by the teacher model including the optical flow module and the teacher model not including the optical flow module. This training method can effectively avoid the ghosting caused by the inconsistent styles of the teacher model and the student model. Phenomenon.

在一个示例中，根据N帧样本合成图像与N帧预测合成图像之间的图像损失函数，确定神经网络模型的参数，其中，图像损失函数包括上述低秩损失函数，低秩损失函数用于表示N帧样本内容图像构成的低秩矩阵与N帧样本合成图像构成的低秩矩阵之间的差异。In one example, the parameters of the neural network model are determined according to the image loss function between N frames of sample composite images and N frames of predicted composite images, wherein the image loss function includes the above-mentioned low-rank loss function, and the low-rank loss function is used to represent The difference between the low-rank matrix formed by N frames of sample content images and the low-rank matrix formed by N frames of sample composite images.

在一个示例中，根据N帧样本合成图像与N帧预测合成图像之间的图像损失函数，确定神经网络模型的参数，其中，图像损失函数包括上述低秩损失函数与上述残差损失函数，低秩损失函数用于表示N帧样本内容图像构成的低秩矩阵与N帧样本合成图像构成的低秩矩阵之间的差异；残差损失函数是根据第一样本合成图像与第二样本合成图像之间的差异得到的；第一样本合成图像是指通过第一模型对所述N帧样本内容图像进行图像风格迁移处理得到的图像，第二样本合成图像是指通过第二模型对N帧样本内容图像进行图像风格迁移处理得到的图像，第一模型与第二模型是根据相同的样本风格图像预先训练的图像风格迁移模型，第二模型包括光流模块，第一模块不包括光流模块，光流模块用于确定N帧样本内容图像的光流信息。In one example, the parameters of the neural network model are determined according to the image loss function between N frames of sample composite images and N frames of predicted composite images, wherein the image loss function includes the above-mentioned low-rank loss function and the above-mentioned residual loss function, low The rank loss function is used to represent the difference between the low-rank matrix composed of N frames of sample content images and the low-rank matrix composed of N frames of sample composite images; the residual loss function is based on the composite image of the first sample and the composite image of the second sample The first sample composite image refers to the image obtained by performing image style transfer processing on the N-frame sample content images through the first model, and the second sample composite image refers to the N-frame sample content image through the second model An image obtained by performing image style transfer on a sample content image. The first model and the second model are pre-trained image style transfer models based on the same sample style image. The second model includes an optical flow module, and the first module does not include an optical flow module. , the optical flow module is used to determine the optical flow information of N frames of sample content images.

可选地，在一种可能的实现方式中，所述图像损失函数还包括感知损失函数，其中，所述感知损失函数包括内容损失与风格损失，所述内容损失用于表示所述N帧预测合成图像与其对应的所述N帧样本内容图像之间的图像内容差异，所述风格损失用于表示所述N帧预测合成图像与所述样本风格图像之间的图像风格差异。Optionally, in a possible implementation manner, the image loss function further includes a perceptual loss function, wherein the perceptual loss function includes a content loss and a style loss, and the content loss is used to represent the N frame prediction The image content difference between the synthesized image and its corresponding N-frame sample content images, the style loss is used to represent the image style difference between the N-frame predicted composite image and the sample style image.

其中，感知损失函数可以用于表示样本内容图像与对应的合成图像之间的内容相似性；以及用于表示样本风格图像与对应的合成图像之间的风格相似性。Among them, the perceptual loss function can be used to represent the content similarity between the sample content image and the corresponding composite image; and to represent the style similarity between the sample style image and the corresponding composite image.

在一个示例中，根据N帧样本合成图像与N帧预测合成图像之间的图像损失函数，确定神经网络模型的参数，其中，图像损失函数包括上述低秩损失函数、上述残差损失函数以及上述感知损失函数。In one example, the parameters of the neural network model are determined according to the image loss function between N frames of sample composite images and N frames of predicted composite images, wherein the image loss function includes the above-mentioned low-rank loss function, the above-mentioned residual loss function, and the above-mentioned Perceptual loss function.

例如，图像损失是通过对上述低秩损失函数、上述残差损失函数以及上述感知损失函数加权处理得到的。For example, the image loss is obtained by weighting the aforementioned low-rank loss function, the aforementioned residual loss function, and the aforementioned perceptual loss function.

可选地，在一种可能的实现方式中，所述目标风格迁移模型的参数是基于所述图像损失函数通过反向传播算法多次迭代得到的。Optionally, in a possible implementation manner, the parameters of the target style transfer model are obtained through multiple iterations of a backpropagation algorithm based on the image loss function.

示例性地，图8是本申请实施例提供的风格迁移模型的训练过程的示意图。Exemplarily, FIG. 8 is a schematic diagram of the training process of the style transfer model provided by the embodiment of the present application.

如图8所示，第一老师模型可以是指上述第二模型，即预先训练的包括光流模块的风格迁移模型；第二老师模型可以是指上述第一模型，即预先训练的不包括光流模块的风格迁移模型；预先训练的基础模型与待训练的学生模型以及目标学生模型的网络结构均相同；其中，第一老师模型的输入数据可以包括第T帧内容图像、用光流信息处理过的第T-1帧合成图像、用光流信息计算后的变化信息，变化信息可以是指根据第T-1内容图像与第T帧内容图像得到的两帧内容图像中不相同的区域；光流信息可以是指第T-1内容图像与第T帧内容图像中对应像素的运动信息，第一老师模型的输出数据为第T帧合成图像(#1)。对于第二老师模型而言，由于该模型不包括光流模块，因此，第二老师模型输入数据中的变化信息可以全置为1；用光流信息处理过的T-1帧合成图像可以全置为0，第二老师模型的输出数据为第T帧合成图像(#2)；待训练的学生模型的输入数据为第T帧内容图像，输出数据为第T帧合成图像(#3)；在训练过程中，预先训练的基础模型的输入数据可以为第T帧内容图像，输出数据为预测的第T帧合成图像(#4)；依次输入第T帧至第T～T+N-1帧共N帧样本内容图像，则预先训练的基础模型可以得到第T～T+N-1帧共N帧预测合成图像，根据图像损失函数即低秩损失函数、残差损失函数以及感知损失函数通过反向传播算法不断更新待训练学生模型的参数，从而得到训练后的目标学生模型。As shown in Figure 8, the first teacher model may refer to the above second model, that is, the pre-trained style transfer model including the optical flow module; the second teacher model may refer to the above first model, that is, the pre-trained style transfer model that does not include the optical flow module. The style transfer model of the flow module; the pre-trained basic model has the same network structure as the student model to be trained and the target student model; wherein, the input data of the first teacher model can include the T-th frame content image, processed with optical flow information The composite image of the T-1th frame, and the change information calculated by using the optical flow information, the change information may refer to the different regions in the two frames of content images obtained according to the content image of the T-1th frame and the content image of the T-th frame; The optical flow information may refer to the motion information of corresponding pixels in the T-1th content image and the Tth frame content image, and the output data of the first teacher model is the Tth frame composite image (#1). For the second teacher model, since the model does not include the optical flow module, the change information in the input data of the second teacher model can be all set to 1; the T-1 frame synthetic image processed with the optical flow information can be all Set to 0, the output data of the second teacher model is the T frame composite image (#2); the input data of the student model to be trained is the T frame content image, and the output data is the T frame composite image (#3); During the training process, the input data of the pre-trained basic model can be the content image of the Tth frame, and the output data is the predicted synthetic image of the Tth frame (#4); sequentially input the Tth frame to the T~T+N-1 A total of N frames of sample content images, the pre-trained basic model can obtain a total of N frames of T~T+N-1 frames to predict the composite image, according to the image loss function, namely the low-rank loss function, residual loss function and perceptual loss function The parameters of the student model to be trained are continuously updated through the backpropagation algorithm, so as to obtain the trained target student model.

在本申请的实施例中，在训练用于视频的风格迁移模型的过程中引入了低秩损失函数，通过低秩信息的学习，能够同步风格迁移后的视频与原始视频的稳定性，从而能够提高目标迁移模型得到的风格迁移处理后视频的稳定性。In the embodiment of the present application, a low-rank loss function is introduced in the process of training the style transfer model for video. Through the learning of low-rank information, the stability of the style-transferred video and the original video can be synchronized, so that Improve the stability of style transfer processed videos obtained by the target transfer model.

此外，本申请实施例中训练的用于视频的风格迁移模型可以是通过采用老师-学习模型学习的策略得到的目标学生模型，一方面能够满足移动设备部署风格迁移模型的需求；另一方面，在训练目标学生模型时使得学习模型学习包括光流模块的老师模型与不包括光流模块的老师模型输出信息之间的差异，从而能够有效避免老师模型和学生模型风格不统一所造成的重影现象，从而能够提高目标迁移模型得到的风格迁移处理后视频的稳定性。In addition, the style transfer model for videos trained in the embodiment of the present application may be the target student model obtained by using the teacher-learning model learning strategy, which can meet the needs of deploying the style transfer model on mobile devices on the one hand; on the other hand, When training the target student model, the learning model learns the difference between the output information of the teacher model that includes the optical flow module and the teacher model that does not include the optical flow module, so as to effectively avoid the ghosting caused by the inconsistent styles of the teacher model and the student model phenomenon, which can improve the stability of the video after style transfer processing obtained by the target transfer model.

图9示出了本申请实施例提供的视频风格迁移的方法700的示意性流程图，该方法可以由能够进行图像风格迁移的装置执行；例如，该方法可以由图6中的执行设备510执行，或者，也可以由本地设备520执行。其中，方法700包括S710至S730，下面分别对这些步骤进行详细的描述。FIG. 9 shows a schematic flow chart of a video style transfer method 700 provided by an embodiment of the present application. The method can be executed by an apparatus capable of image style transfer; for example, the method can be executed by the execution device 510 in FIG. 6 , or may also be executed by the local device 520 . Wherein, the method 700 includes S710 to S730, and these steps are described in detail below respectively.

S710、获取待处理视频。S710. Obtain a video to be processed.

其中，待处理视频包括N帧待处理内容图像，N为大于或者等于2的整数。Wherein, the video to be processed includes N frames of content images to be processed, and N is an integer greater than or equal to 2.

示例性地，待处理视频可以是电子设备通过摄像头拍摄到的视频，或者，该待处理视频还可以是从电子设备内部获得的视频(例如，电子设备的相册中存储的视频，或者，电子设备从云端获取的视频)。Exemplarily, the video to be processed may be a video captured by the electronic device through a camera, or the video to be processed may also be a video obtained from inside the electronic device (for example, a video stored in the photo album of the electronic device, or a video stored in the electronic device video fetched from the cloud).

S720、根据目标风格迁移模型对N帧待处理内容图像进行图像风格迁移处理，得到N帧合成图像。S720. Perform image style transfer processing on N frames of content images to be processed according to the target style transfer model to obtain N frames of composite images.

S730、根据N帧合成图像，得到待处理视频对应的风格迁移处理后的视频。S730. Synthesize images according to the N frames, and obtain a style-transfer-processed video corresponding to the video to be processed.

其中，目标风格迁移模型的参数是根据目标风格迁移模型对N帧样本内容图像进行风格迁移处理的图像损失函数确定的，图像损失函数包括低秩损失函数，低秩损失函数用于表示第一低秩矩阵与第二低秩矩阵之间的差异，所述第一低秩矩阵是基于N帧样本内容图像与光流信息得到的，所述第二低秩矩阵是基于N帧预测合成图像与所述光流信息得到的，所述光流信息用于表示所述N帧样本内容图像中相邻两帧图像之间对应像素点的位置差异，N帧预测合成图像是指通过目标风格迁移模型根据样本风格图像对N帧样本内容图像进行图像风格迁移处理后得到的图像。Among them, the parameters of the target style transfer model are determined according to the image loss function that performs style transfer processing on N frames of sample content images according to the target style transfer model. The image loss function includes a low-rank loss function, and the low-rank loss function is used to represent the first lowest The difference between the rank matrix and the second low-rank matrix, the first low-rank matrix is obtained based on N frames of sample content images and optical flow information, and the second low-rank matrix is based on N frames of predicted composite images and the The optical flow information is obtained from the above optical flow information, and the optical flow information is used to represent the position difference of corresponding pixels between two adjacent frames of images in the N frames of sample content images. N frames of predicted composite images refer to the target style transfer model according to The sample style image is an image obtained by performing image style transfer processing on N frames of sample content images.

需要说明的是，上述N帧样本内容图像是指视频中的N帧相邻的图像；N帧合成图像是指与N帧样本内容图像对应的图像；上述目标风格迁移网络可以是指通过图7所示的训练的方法得到的预先训练的风格迁移模型。It should be noted that the above-mentioned N frames of sample content images refer to images adjacent to N frames in the video; N frames of synthetic images refer to images corresponding to N frames of sample content images; the above-mentioned target style transfer network can refer to The shown training method results in a pre-trained style transfer model.

例如，假设视频中包括连续的5帧图像，即连续的5帧样本内容图像；得到5帧样本内容图像进行风格迁移处理的5帧风格迁移结果即5帧样本合成图像；低秩矩阵可以采用如下方式进行计算：For example, assuming that the video includes 5 consecutive frames of images, that is, 5 consecutive frames of sample content images; the style transfer results of 5 frames of sample content images obtained through style transfer processing are 5 frames of sample composite images; the low-rank matrix can be used as follows way to calculate:

在本申请的实施例中在训练风格迁移模型时引入低秩损失函数的目标是在原始视频中相邻多帧图像中都出现且不是运动边界的区域，在经过风格迁移处理后仍保持相同，即使得风格迁移处理后的视频中该区域的秩逼近于原始视频该区域的秩，从而能够提高风格迁移处理后视频的稳定性。In the embodiment of the present application, the goal of introducing a low-rank loss function when training the style transfer model is that the regions that appear in adjacent multiple frames of images in the original video and are not motion boundaries remain the same after the style transfer process. That is, the rank of the region in the style-transferred video is close to the rank of the original video, so that the stability of the style-transferred video can be improved.

进一步地，在本申请的实施例中，上述图像损失函数中还包括残差损失函数，残差损失函数是根据第一样本合成图像与第二样本合成图像之间的差异得到的，其中，第一样本合成图像是通过第一模型对述N帧样本内容图像进行图像风格迁移处理得到的，第二样本合成图像是通过第二模型对N帧样本内容图像进行图像风格迁移处理得到的，第一模型与第二模型是根据样本风格图像预先训练的图像风格迁移模型，第二模型包括光流模块，光流模块用于确定N帧样本内容图像的光流信息。Further, in the embodiment of the present application, the above image loss function also includes a residual loss function, and the residual loss function is obtained according to the difference between the composite image of the first sample and the composite image of the second sample, wherein, The first sample composite image is obtained by performing image style transfer processing on the N frames of sample content images by the first model, and the second sample composite image is obtained by performing image style transfer processing on the N frames of sample content images by the second model, The first model and the second model are image style transfer models pre-trained according to sample style images, and the second model includes an optical flow module, which is used to determine optical flow information of N frames of sample content images.

应理解，第一样本合成图像与第二样本合成图像之间的差异可以是指第一样本合成图像与第二样本合成图像对应的像素值之间的位置差值。It should be understood that the difference between the first sample composite image and the second sample composite image may refer to a position difference between pixel values corresponding to the first sample composite image and the second sample composite image.

需要说明的是，上述第一模型与第二模型是指通过相同的风格图像训练的风格迁移模型，其中，第一模型与第二模型的区别在于第一模型中不包括光流模块；第二模型中包括光流模块；即第一模型与第二模型在训练时可以采用相同的样本内容图像以及样本风格图像；例如，在训练阶段第一模型与第二模型可以是指相同的模型；但是，在测试阶段第二模型还需要计算多帧样本内容图像之间的光流信息；而第一模型则不需要计算多帧图像之间的光流信息。It should be noted that the first model and the second model above refer to the style transfer model trained by the same style image, wherein the difference between the first model and the second model is that the first model does not include an optical flow module; the second The model includes an optical flow module; that is, the first model and the second model can use the same sample content image and sample style image during training; for example, the first model and the second model can refer to the same model in the training phase; but , in the test phase, the second model also needs to calculate the optical flow information between multiple frames of sample content images; while the first model does not need to calculate the optical flow information between multiple frames of images.

需要说明的是，学习模型与目标学生模型的网络结构是相同的，学生模型可以是指预先训练的在测试阶段不需要输入光流信息的风格迁移模型。It should be noted that the learning model has the same network structure as the target student model, and the student model can refer to a pre-trained style transfer model that does not require input of optical flow information during the test phase.

应理解，知识蒸馏是指使得深度学习模型小型化、达到终端设备部署要求的关键技术。相较与量化、稀疏化等压缩技术，其不需要特定的硬件支持就能达到压缩模型的目的。知识蒸馏技术采用老师-学生模型学习的策略，其中，老师模型可以指模型参数大，一般不能满足部署需求；而学生模型参数量少，能够直接部署。通过设计有效的知识蒸馏算法，让学生模型学习模仿老师模型的行为，进行有效的知识迁移，使得学生模型最终能够表现得与老师模型相同的处理能力。It should be understood that knowledge distillation refers to the key technology to make the deep learning model miniaturized and meet the deployment requirements of terminal equipment. Compared with compression techniques such as quantization and sparsification, it does not require specific hardware support to achieve the purpose of compressing the model. The knowledge distillation technology adopts the strategy of teacher-student model learning. Among them, the teacher model can refer to large model parameters, which generally cannot meet the deployment requirements; while the student model has few parameters and can be directly deployed. By designing an effective knowledge distillation algorithm, the student model can learn to imitate the behavior of the teacher model, and carry out effective knowledge transfer, so that the student model can finally perform the same processing ability as the teacher model.

在本申请的实施例中，可以通过采用测试时包括光流模块的模型来对测试时无光流模块的模型进行知识蒸馏；在视频进行风格迁移中，由于老师模型与学生模型的结构不同以及训练方式不同，学生模型与老师模型的风格化效果可能会不完全相同；若让学生模型直接像素级地学习老师模型的输出信息，则学生模型的输出可能会出现重影或者模糊的现象。在本申请的实施例中，通过学习测试时包括光流模块的老师模型和测试时不包括光流模块的教师模型输出的风格迁移结果的不同，从而能够有效避免老师模型和学生模型风格不统一所造成的重影现象，从而能够提高目标迁移模型得到的风格迁移处理后视频的稳定性。In the embodiment of the present application, knowledge distillation can be performed on the model without the optical flow module during the test by using the model including the optical flow module during the test; in the style transfer of the video, due to the different structures of the teacher model and the student model and Depending on the training method, the stylization effects of the student model and the teacher model may not be exactly the same; if the student model directly learns the output information of the teacher model at the pixel level, the output of the student model may appear ghosting or blurred. In the embodiment of the present application, by learning the difference between the style transfer results output by the teacher model that includes the optical flow module during the test and the teacher model that does not include the optical flow module during the test, it is possible to effectively avoid the inconsistency between the teacher model and the student model. The ghosting phenomenon caused by the target transfer model can improve the stability of the video after style transfer processing obtained by the target transfer model.

在一个示例中，根据N帧样本合成图像与N帧预测合成图像之间的图像损失函数，确定神经网络模型的参数，其中，图像损失函数包括上述低秩损失函数与上述残差损失函数，低秩损失函数用于表示N帧样本内容图像构成的低秩矩阵与N帧样本合成图像构成的低秩矩阵之间的差异；残差损失函数是根据第一样本合成图像与第二样本合成图像之间的差异得到的；第一样本合成图像是指通过第一模型对所述N帧样本内容图像进行图像风格迁移处理得到的图像，第二样本合成图像是指通过第二模型对N帧样本内容图像进行图像风格迁移处理得到的图像，第一模型与第二模型是根据相同的样本风格图像预先训练的图像风格迁移模型，第二模型包括光流模块，第一模型不包括光流模块，光流模块可以用于确定N帧样本内容图像的光流信息。In one example, the parameters of the neural network model are determined according to the image loss function between N frames of sample composite images and N frames of predicted composite images, wherein the image loss function includes the above-mentioned low-rank loss function and the above-mentioned residual loss function, low The rank loss function is used to represent the difference between the low-rank matrix composed of N frames of sample content images and the low-rank matrix composed of N frames of sample composite images; the residual loss function is based on the composite image of the first sample and the composite image of the second sample The first sample composite image refers to the image obtained by performing image style transfer processing on the N-frame sample content images through the first model, and the second sample composite image refers to the N-frame sample content image through the second model An image obtained by performing image style transfer processing on a sample content image. The first model and the second model are pre-trained image style transfer models based on the same sample style image. The second model includes an optical flow module, and the first model does not include an optical flow module. , the optical flow module can be used to determine the optical flow information of N frames of sample content images.

在本申请的实施例中，在目标风格迁移模型的过程中引入了低秩损失函数，通过低秩信息的学习，能够同步风格迁移后的视频与原始视频的稳定性，从而能够提高目标迁移模型得到的风格迁移处理后视频的稳定性。In the embodiment of this application, a low-rank loss function is introduced in the process of the target style transfer model. Through the learning of low-rank information, the stability of the style-transferred video and the original video can be synchronized, thereby improving the target transfer model. Stabilization of videos after the resulting style transfer process.

此外，本申请实施例中目标风格迁移模型可以是通过采用老师-学习模型学习的策略得到的目标学生模型，一方面能够满足移动设备部署风格迁移模型的需求；另一方面，在训练目标学生模型时使得学习模型学习包括光流模块的老师模型与不包括光流模块的老师模型输出信息之间的差异，从而能够有效避免老师模型和学生模型风格不统一所造成的重影现象，从而能够提高目标迁移模型得到的风格迁移处理后视频的稳定性。In addition, in the embodiment of the present application, the target style transfer model can be a target student model obtained by adopting the strategy of teacher-learning model learning, which can meet the needs of deploying the style transfer model on mobile devices on the one hand; on the other hand, when training the target student model When the learning model learns the difference between the output information of the teacher model that includes the optical flow module and the teacher model that does not include the optical flow module, it can effectively avoid the ghosting phenomenon caused by the inconsistent styles of the teacher model and the student model, thereby improving Stabilization of videos after style transfer processing obtained by the object transfer model.

进一步，本申请实施例中提供的目标风格迁移模型在测试阶段即对待处理视频进行风格迁移处理的过程中不需要计算视频中包括的多帧图像之间的光流信息，因此本申请实施例提供的目标迁风格移模型在提高稳定性的同时还能缩短模型的风格迁移处理的时间，提升目标风格迁移模型的运行效率。Further, the target style transfer model provided in the embodiment of the present application does not need to calculate the optical flow information between the multiple frames of images included in the video during the test phase, that is, during the style transfer process of the video to be processed, so the embodiment of the present application provides The target style transfer model can not only improve the stability, but also shorten the processing time of the model's style transfer, and improve the operation efficiency of the target style transfer model.

示例性地，图10是本申请实施例提供的训练阶段以及测试阶段的示意图。Exemplarily, FIG. 10 is a schematic diagram of the training phase and the testing phase provided by the embodiment of the present application.

训练阶段：Training phase:

例如，在本申请实施例中可以采用Flownet2网络和根据Hollywood2数据集生成了带有光流数据的数据集，并采用本申请实施例所示的训练方法对网络进行训练。For example, in the embodiment of the present application, the Flownet2 network can be used and a data set with optical flow data can be generated based on the Hollywood2 data set, and the network can be trained by using the training method shown in the embodiment of the present application.

例如，具体实现步骤包括：首先只采用感知损失函数或者采用感知损失函数与光流损失函数训练一个风格迁移模型即预先训练的基础模型采用视频数据和光流数据训练一个包括光流模块的老师模型N_T和一个不包括光流模块的老师模型/>然后根据N_T、/>以及上述低秩损失函数以及残差损失函数训练待训练的学生模型N_S，最终得到训练后的目标学生模型，其中，预先训练的基础模型、待训练的学生模型以及目标学生模型的网络结构均相同。For example, the specific implementation steps include: first, only use the perceptual loss function or use the perceptual loss function and the optical flow loss function to train a style transfer model, that is, the pre-trained basic model Using video data and optical flow data to train a teacher model _NT including the optical flow module and a teacher model not including the optical flow module /> then according to _NT , /> And the above-mentioned low-rank loss function and residual loss function train the student model _NS to be trained, and finally obtain the target student model after training, in which the network structures of the pre-trained basic model, the student model to be trained and the target student model are all same.

需要说明的是，训练阶段的具体实现方式可以参见前述图7以及图8中的描述，此处不再赘述。It should be noted that, for the specific implementation manner of the training phase, reference may be made to the foregoing descriptions in FIG. 7 and FIG. 8 , which will not be repeated here.

测试阶段：Testing phase:

示例性地，在测试可以在目标学生模型中输入测试数据，通过目标学生模型可以得到测试结果，即风格迁移处理后的数据。Exemplarily, during the test, test data may be input into the target student model, and test results, that is, data after style transfer processing, may be obtained through the target student model.

需要说明的是，测试阶段的具体实现方式可以参见前述图9中的描述，此处不再赘述。It should be noted that, for the specific implementation manner of the testing phase, reference may be made to the description in FIG. 9 above, which will not be repeated here.

表1Table 1

其中，表1中的老师模型可以是指上述实施例中的第二模型，即在测试阶段包括光流模块的风格迁移模型；第一类学生模型可以是指通过感知损失函数训练得到的预先训练的学生模型；第二类学生模型可以是指通过感知损失函数与光流损失函数训练得到的预先训练的学生模型；损失函数1可以是指本申请中的残差损失函数；损失函数2可以是指本申请中的低秩损失函数；Alley_2、Ambush_5、Bandage_2、Market_6以及Temple_2分别表示MPI-Sintel数据集中五个视频数据的名称；All表示前面的五个视频。表1中示出在通过采用MPI-Sintel数据集对不同模型的稳定性的测试结果；其中，稳定性指标计算方式可以采用如下公式：Among them, the teacher model in Table 1 may refer to the second model in the above embodiment, that is, the style transfer model including the optical flow module in the test phase; the first type of student model may refer to the pre-trained model obtained by perceptual loss function training student model; the second type of student model can refer to the pre-trained student model obtained through perceptual loss function and optical flow loss function training; loss function 1 can refer to the residual loss function in this application; loss function 2 can be Refers to the low-rank loss function in this application; Alley_2, Ambush_5, Bandage_2, Market_6, and Temple_2 respectively represent the names of the five video data in the MPI-Sintel dataset; All represents the previous five videos. Table 1 shows the test results of the stability of different models by using the MPI-Sintel data set; wherein, the calculation method of the stability index can use the following formula:

其中，T表示视频包括的图像帧数；D＝c*w*d,M_t∈R^(w*d)表示掩码信息，O_t表示t帧的风格迁移结果；O_(t-1)表示t-1帧的风格迁移结果；W_t表示从t-1帧到t帧的光流信息；W_t(O_t-1)表示将t-1帧的风格迁移结果和t帧的风格迁移结果对齐。Among them, T represents the number of image frames included in the video; D=c*w*d, M _t ∈ R ^(w*d) represents the mask information, O _t represents the style transfer result of t frame; O _(t-1) represents The style transfer result of t-1 frame; W _t represents the optical flow information from t-1 frame to t frame; W _t (O _t-1 ) represents the style transfer result of t-1 frame and the style transfer result of t frame align.

如表1所示，稳定性指标的结果越小则表示模型的迁移处理后的输出数据的稳定性越好；从表1所示的测试结果可以看出本申请实施例提供的目标迁移模型的进行风格迁移处理后的输出数据的稳定性明显优于其它模型。As shown in Table 1, the smaller the result of the stability index, the better the stability of the output data after the migration process of the model; from the test results shown in Table 1, it can be seen that the target migration model provided by the embodiments of the present application The stability of the output data after style transfer processing is significantly better than other models.

应理解，上述举例说明是为了帮助本领域技术人员理解本申请实施例，而非要将本申请实施例限于所例示的具体数值或具体场景。本领域技术人员根据所给出的上述举例说明，显然可以进行各种等价的修改或变化，这样的修改或变化也落入本申请实施例的范围内。It should be understood that the above illustrations are intended to help those skilled in the art understand the embodiments of the present application, rather than to limit the embodiments of the present application to the illustrated specific values or specific scenarios. Those skilled in the art can obviously make various equivalent modifications or changes based on the above illustrations given, and such modifications or changes also fall within the scope of the embodiments of the present application.

上文结合图1至图10，详细描述了本申请实施例提供的风格迁移的训练方法以及视频风格迁移的方法；下面将结合图11至图14，详细描述本申请的装置实施例。应理解，本申请实施例中的图像处理装置可以执行前述本申请实施例的各种方法，即以下各种产品的具体工作过程，可以参考前述方法实施例中的对应过程。The above describes in detail the style transfer training method and the video style transfer method provided by the embodiment of the present application with reference to FIG. 1 to FIG. 10 ; the device embodiment of the present application will be described in detail below in conjunction with FIG. 11 to FIG. 14 . It should be understood that the image processing apparatus in the embodiment of the present application can execute the various methods in the foregoing embodiments of the present application, that is, the specific working processes of the following various products can refer to the corresponding processes in the foregoing method embodiments.

图11是本申请实施例提供的视频风格迁移的装置的示意性框图。Fig. 11 is a schematic block diagram of an apparatus for video style transfer provided by an embodiment of the present application.

应理解，视频风格迁移的装置800可以执行图9所示的方法，或者，图10所示的测试阶段的方法。该装置800包括：获取单元810和处理单元820。It should be understood that the apparatus 800 for video style transfer may execute the method shown in FIG. 9 , or the method in the testing phase shown in FIG. 10 . The apparatus 800 includes: an acquisition unit 810 and a processing unit 820 .

其中，获取单元810用于获取待处理视频，其中，所述待处理视频包括N帧待处理内容图像，N为大于或者等于2的整数；处理单元820用于根据目标风格迁移模型对所述N帧待处理内容图像进行图像风格迁移处理，得到N帧合成图像；根据所述N帧合成图像，得到所述待处理视频对应的风格迁移处理后的视频，Wherein, the obtaining unit 810 is used to obtain the video to be processed, wherein the video to be processed includes N frames of content images to be processed, and N is an integer greater than or equal to 2; the processing unit 820 is used to transfer the N frames according to the target style transfer model Perform image style transfer processing on the content image to be processed to obtain N frames of composite images; according to the N frames of composite images, obtain the video after the style transfer processing corresponding to the video to be processed,

其中，所述目标风格迁移模型的参数是根据所述目标风格迁移模型对N帧样本内容图像进行风格迁移处理的图像损失函数确定的，所述图像损失函数包括低秩损失函数，所述低秩损失函数用于表示第一低秩矩阵与第二低秩矩阵之间的差异，所述第一低秩矩阵是基于N帧样本内容图像与光流信息得到的，所述第二低秩矩阵是基于N帧预测合成图像与所述光流信息得到的，所述光流信息用于表示所述N帧样本内容图像中相邻两帧图像之间对应像素点的位置差异，所述N帧预测合成图像是指通过所述目标风格迁移模型根据样本风格图像对所述N帧样本内容图像进行图像风格迁移处理后得到的图像。Wherein, the parameters of the target style transfer model are determined according to the image loss function that performs style transfer processing on N frames of sample content images according to the target style transfer model, and the image loss function includes a low-rank loss function, and the low-rank The loss function is used to represent the difference between the first low-rank matrix and the second low-rank matrix. The first low-rank matrix is obtained based on N frames of sample content images and optical flow information. The second low-rank matrix is Obtained based on N frames of predicted composite images and the optical flow information, the optical flow information is used to represent the position difference of corresponding pixels between two adjacent frames of images in the N frames of sample content images, and the N frames of predicted The synthetic image refers to an image obtained by performing image style transfer processing on the N frames of sample content images according to the sample style images through the target style transfer model.

可选地，作为一个实施例，所述图像损失函数还包括残差损失函数，所述残差损失函数是根据第一样本合成图像与第二样本合成图像之间的差异得到的，Optionally, as an embodiment, the image loss function further includes a residual loss function, and the residual loss function is obtained according to the difference between the composite image of the first sample and the composite image of the second sample,

其中，所述第一样本合成图像是指通过第一模型对所述N帧样本内容图像进行图像风格迁移处理得到的图像，所述第二样本合成图像是指通过第二模型对所述N帧样本内容图像进行图像风格迁移处理得到的图像，所述第一模型与所述第二模型是根据所述样本风格图像预先训练的图像风格迁移模型，所述第二模型包括光流模块，所述第一模块不包括所述光流模块，所述光流模块用于确定所述N帧样本内容图像的光流信息。Wherein, the first sample synthetic image refers to an image obtained by performing image style transfer processing on the N frames of sample content images through the first model, and the second sample synthetic image refers to the image obtained by performing image style transfer processing on the N frames of sample content images through the second model. An image obtained by performing image style transfer processing on a frame sample content image, the first model and the second model are image style transfer models pre-trained according to the sample style image, the second model includes an optical flow module, and The first module does not include the optical flow module, and the optical flow module is used to determine the optical flow information of the N frames of sample content images.

可选地，作为一个实施例，所述第一模型与所述第二模型为预先训练的老师模型，所述目标风格迁移模型是指根据所述残差损失函数与知识蒸馏算法对待训练的学生模型进行训练得到的目标学生模型。Optionally, as an embodiment, the first model and the second model are pre-trained teacher models, and the target style transfer model refers to the students to be trained according to the residual loss function and knowledge distillation algorithm The target student model obtained by training the model.

可选地，作为一个实施例，所述残差损失函数是根据以下等式得到的，Optionally, as an embodiment, the residual loss function is obtained according to the following equation,

可选地，作为一个实施例，所述图像损失函数还包括感知损失函数，所述图像损失函数还包括感知损失函数，其中，所述感知损失函数包括内容损失与风格损失，所述内容损失用于表示所述N帧预测合成图像与其对应的所述N帧样本内容图像之间的图像内容差异，所述风格损失用于表示所述N帧预测合成图像与所述样本风格图像之间的图像风格差异。Optionally, as an embodiment, the image loss function further includes a perceptual loss function, and the image loss function further includes a perceptual loss function, wherein the perceptual loss function includes a content loss and a style loss, and the content loss uses To represent the image content difference between the N frames of predicted composite images and their corresponding N frames of sample content images, the style loss is used to represent the images between the N frames of predicted composite images and the sample style images style difference.

可选地，作为一个实施例，所述图像损失函数是通过对所述低秩损失函数、所述残差损失函数以及所述感知损失函数加权处理得到的。Optionally, as an embodiment, the image loss function is obtained by weighting the low-rank loss function, the residual loss function, and the perceptual loss function.

可选地，作为一个实施例，所述目标风格迁移模型的参数是基于所述图像损失函数通过反向传播算法多次迭代得到的。Optionally, as an embodiment, the parameters of the target style transfer model are obtained through multiple iterations of a backpropagation algorithm based on the image loss function.

图12是本申请实施例提供的风格迁移模型的训练装置的示意性框图。Fig. 12 is a schematic block diagram of a training device for a style transfer model provided by an embodiment of the present application.

应理解，训练装置900可以执行图7、图8或图10所示的风格迁移模型的训练方法。该训练装置900包括：获取单元910和处理单元920。It should be understood that the training device 900 may implement the style transfer model training method shown in FIG. 7 , FIG. 8 or FIG. 10 . The training device 900 includes: an acquisition unit 910 and a processing unit 920 .

其中，获取单元910用于获取训练数据，其中，所述训练数据包括N帧样本内容图像、样本风格图像以及N帧合成图像，所述N帧合成图像是根据所述样本风格图像对所述N帧样本内容图像进行图像风格迁移处理后得到的图像，N为大于或者等于2的整数；处理单元920用于通过神经网络模型根据所述样本风格图像对所述N帧样本内容图像进行图像风格迁移处理，得到N帧预测合成图像；根据所述N帧样本内容图像与所述N帧预测合成图像之间的图像损失函数，确定所述神经网络模型的参数，Wherein, the obtaining unit 910 is used to obtain training data, wherein the training data includes N frames of sample content images, sample style images and N frames of composite images, and the N frames of composite images are based on the comparison of the N frames of sample style images. An image obtained after the frame sample content image is subjected to image style transfer processing, N is an integer greater than or equal to 2; the processing unit 920 is used to perform image style transfer on the N frames of sample content images according to the sample style image through a neural network model Processing to obtain N frames of predicted composite images; according to the image loss function between the N frames of sample content images and the N frames of predicted composite images, determine the parameters of the neural network model,

其中，所述第一样本合成图像是指通过第一模型对所述N帧样本内容图像进行图像风格迁移处理得到的图像，所述第二样本合成图像是指通过第二模型对所述N帧样本内容图像进行图像风格迁移处理得到的图像，所述第一模型与所述第二模型是根据所述样本风格图像预先训练的图像风格迁移模型，所述第二模型包括光流模块，所述第一模型不包括所述光流模块，所述光流模块用于确定所述N帧样本内容图像的光流信息。Wherein, the first sample synthetic image refers to an image obtained by performing image style transfer processing on the N frames of sample content images through the first model, and the second sample synthetic image refers to the image obtained by performing image style transfer processing on the N frames of sample content images through the second model. An image obtained by performing image style transfer processing on a frame sample content image, the first model and the second model are image style transfer models pre-trained according to the sample style image, the second model includes an optical flow module, and The first model does not include the optical flow module, and the optical flow module is used to determine the optical flow information of the N frames of sample content images.

可选地，作为一个实施例，所述图像损失函数还包括感知损失函数，其中，所述感知损失函数包括内容损失与风格损失，所述内容损失用于表示所述N帧预测合成图像与其对应的所述N帧样本内容图像之间的图像内容差异，所述风格损失用于表示所述N帧预测合成图像与所述样本风格图像之间的图像风格差异。Optionally, as an embodiment, the image loss function further includes a perceptual loss function, wherein the perceptual loss function includes a content loss and a style loss, and the content loss is used to indicate that the N frames of predicted composite images correspond to The image content differences between the N frames of sample content images, the style loss is used to represent the image style differences between the N frames of predicted composite images and the sample style images.

需要说明的是，上述装置800以及训练装置900以功能单元的形式体现。这里的术语“单元”可以通过软件和/或硬件形式实现，对此不作具体限定。It should be noted that the above-mentioned device 800 and the training device 900 are embodied in the form of functional units. The term "unit" here may be implemented in the form of software and/or hardware, which is not specifically limited.

例如，“单元”可以是实现上述功能的软件程序、硬件电路或二者结合。所述硬件电路可能包括应用特有集成电路(application specific integrated circuit，ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。For example, a "unit" may be a software program, a hardware circuit or a combination of both to realize the above functions. The hardware circuitry may include an application specific integrated circuit (ASIC), an electronic circuit, a processor (such as a shared processor, a dedicated processor, or a group processor) for executing one or more software or firmware programs. etc.) and memory, incorporating logic, and/or other suitable components to support the described functionality.

因此，在本申请的实施例中描述的各示例的单元，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Therefore, the units of each example described in the embodiments of the present application can be realized by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

图13是本申请实施例提供的视频风格迁移的装置的硬件结构示意图。图13所示的装置1000(该装置1000具体可以是一种计算机设备)包括存储器1010、处理器1020、通信接口1030以及总线1040。其中，存储器1010、处理器1020、通信接口1030通过总线1040实现彼此之间的通信连接。FIG. 13 is a schematic diagram of a hardware structure of an apparatus for video style migration provided by an embodiment of the present application. The apparatus 1000 shown in FIG. 13 (the apparatus 1000 may specifically be a computer device) includes a memory 1010 , a processor 1020 , a communication interface 1030 and a bus 1040 . Wherein, the memory 1010 , the processor 1020 , and the communication interface 1030 are connected to each other through the bus 1040 .

存储器1010可以是只读存储器(read only memory，ROM)，静态存储设备，动态存储设备或者随机存取存储器(random access memory，RAM)。存储器1010可以存储程序，当存储器1010中存储的程序被处理器1020执行时，处理器1020用于执行本申请实施例的视频风格迁移的方法的各个步骤；例如，执行图9所示的各个步骤。The memory 1010 may be a read only memory (read only memory, ROM), a static storage device, a dynamic storage device or a random access memory (random access memory, RAM). The memory 1010 may store a program, and when the program stored in the memory 1010 is executed by the processor 1020, the processor 1020 is configured to execute each step of the method for video style transfer in the embodiment of the present application; for example, execute each step shown in FIG. 9 .

应理解，本申请实施例所示的视频风格迁移的装置可以是服务器，例如，可以是云端的服务器，或者，也可以是配置于云端的服务器中的芯片。It should be understood that the apparatus for video style transfer shown in the embodiment of the present application may be a server, for example, a server in the cloud, or a chip configured in the server in the cloud.

处理器1020可以采用通用的中央处理器(central processing unit，CPU)，微处理器，应用专用集成电路(application specific integrated circuit，ASIC)，图形处理器(graphics processing unit，GPU)或者一个或多个集成电路，用于执行相关程序以实现本申请方法实施例的图像分类方法。The processor 1020 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application specific integrated circuit (application specific integrated circuit, ASIC), a graphics processing unit (graphics processing unit, GPU) or one or more The integrated circuit is used to execute related programs to realize the image classification method of the method embodiment of the present application.

处理器1020还可以是一种集成电路芯片，具有信号的处理能力。在实现过程中，本申请的图像分类方法的各个步骤可以通过处理器1020中的硬件的集成逻辑电路或者软件形式的指令完成。The processor 1020 may also be an integrated circuit chip, which has a signal processing capability. During implementation, each step of the image classification method of the present application may be completed by an integrated logic circuit of hardware in the processor 1020 or instructions in the form of software.

上述处理器1020还可以是通用处理器、数字信号处理器(digital signalprocessing，DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gatearray，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1010，处理器1020读取存储器1010中的信息，结合其硬件完成本申请实施中图11所示的装置中包括的单元所需执行的功能，或者，执行本申请方法实施例的图9所示的视频风格迁移的方法。The above-mentioned processor 1020 can also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application-specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates Or transistor logic devices, discrete hardware components. Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register. The storage medium is located in the memory 1010, and the processor 1020 reads the information in the memory 1010, and combines its hardware to complete the functions required by the units included in the device shown in Figure 11 in the implementation of the application, or execute the method embodiment of the application Figure 9 shows the video style transfer method.

通信接口1030使用例如但不限于收发器一类的收发装置，来实现装置1000与其他设备或通信网络之间的通信。The communication interface 1030 implements communication between the apparatus 1000 and other devices or communication networks by using a transceiver device such as but not limited to a transceiver.

总线1040可包括在装置1000各个部件(例如，存储器1010、处理器1020、通信接口1030)之间传送信息的通路。Bus 1040 may include pathways for communicating information between various components of device 1000 (eg, memory 1010, processor 1020, communication interface 1030).

图14是本申请实施例提供的风格迁移模型的训练装置的硬件结构示意图。图14所示的训练装置1100(该训练装置1100具体可以是一种计算机设备)包括存储器1110、处理器1120、通信接口1130以及总线1140。其中，存储器1110、处理器1120、通信接口1130通过总线1140实现彼此之间的通信连接。Fig. 14 is a schematic diagram of the hardware structure of the training device for the style transfer model provided by the embodiment of the present application. The training device 1100 shown in FIG. 14 (the training device 1100 may specifically be a computer device) includes a memory 1110 , a processor 1120 , a communication interface 1130 and a bus 1140 . Wherein, the memory 1110 , the processor 1120 , and the communication interface 1130 are connected to each other through the bus 1140 .

存储器1110可以是只读存储器(read only memory，ROM)，静态存储设备，动态存储设备或者随机存取存储器(random access memory，RAM)。存储器1110可以存储程序，当存储器1110中存储的程序被处理器1120执行时，处理器1120用于执行本申请实施例的风格迁移模型的训练方法的各个步骤；例如，执行图7或图8所示的各个步骤。The memory 1110 may be a read only memory (read only memory, ROM), a static storage device, a dynamic storage device or a random access memory (random access memory, RAM). The memory 1110 may store a program. When the program stored in the memory 1110 is executed by the processor 1120, the processor 1120 is used to execute each step of the method for training the style transfer model in the embodiment of the present application; each step shown.

应理解，本申请实施例所示的训练装置可以是服务器，例如，可以是云端的服务器，或者，也可以是配置于云端的服务器中的芯片。It should be understood that the training device shown in the embodiment of the present application may be a server, for example, it may be a server in the cloud, or it may also be a chip configured in the server in the cloud.

示例性地，处理器1120可以采用通用的中央处理器(central processing unit，CPU)，微处理器，应用专用集成电路(application specific integrated circuit，ASIC)，图形处理器(graphics processing unit，GPU)或者一个或多个集成电路，用于执行相关程序，以实现本申请方法实施例的图像分类模型的训练方法。Exemplarily, the processor 1120 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application specific integrated circuit (application specific integrated circuit, ASIC), a graphics processing unit (graphics processing unit, GPU) or One or more integrated circuits are used to execute related programs to implement the image classification model training method of the method embodiment of the present application.

示例性地，处理器1120还可以是一种集成电路芯片，具有信号的处理能力。在实现过程中，本申请的风格迁移模型的训练方法的各个步骤可以通过处理器1120中的硬件的集成逻辑电路或者软件形式的指令完成。Exemplarily, the processor 1120 may also be an integrated circuit chip having a signal processing capability. In the implementation process, each step of the style transfer model training method of the present application can be completed by an integrated logic circuit of hardware in the processor 1120 or instructions in the form of software.

上述处理器1120还可以是通用处理器、数字信号处理器(digital signalprocessing，DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gatearray，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1110，处理器1120读取存储器1110中的信息，结合其硬件完成图12所示的训练装置中包括的单元所需执行的功能，或者，执行本申请方法实施例的图7或者图8所示的风格迁移模型的训练方法。The above-mentioned processor 1120 may also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates Or transistor logic devices, discrete hardware components. Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register. The storage medium is located in the memory 1110, and the processor 1120 reads the information in the memory 1110, and combines its hardware to complete the functions required by the units included in the training device shown in Figure 12, or execute the method shown in Figure 7 of the method embodiment of the present application Or the training method of the style transfer model shown in FIG. 8 .

通信接口1130使用例如但不限于收发器一类的收发装置，来实现训练装置1100与其他设备或通信网络之间的通信。The communication interface 1130 implements communication between the training device 1100 and other devices or communication networks using a transceiver device such as but not limited to a transceiver.

总线1140可包括在训练装置1100各个部件(例如，存储器1110、处理器1120、通信接口1130)之间传送信息的通路。Bus 1140 may include pathways for communicating information between various components of exercise device 1100 (eg, memory 1110 , processor 1120 , communication interface 1130 ).

应注意，尽管上述装置1000和训练装置1100仅仅示出了存储器、处理器、通信接口，但是在具体实现过程中，本领域的技术人员应当理解，装置1000和训练装置1100还可以包括实现正常运行所必须的其他器件。同时，根据具体需要本领域的技术人员应当理解，上述装置1000和训练装置1100还可包括实现其他附加功能的硬件器件。此外，本领域的技术人员应当理解，上述装置1000和训练装置1100也可仅仅包括实现本申请实施例所必须的器件，而不必包括图13或图14中所示的全部器件。It should be noted that although the above-mentioned device 1000 and training device 1100 only show a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the device 1000 and the training device 1100 may also include other necessary devices. At the same time, those skilled in the art should understand according to specific needs that the above-mentioned device 1000 and training device 1100 may also include hardware devices for implementing other additional functions. In addition, those skilled in the art should understand that the above-mentioned device 1000 and training device 1100 may only include the devices necessary to realize the embodiment of the present application, instead of all the devices shown in FIG. 13 or FIG. 14 .

示例性地，本申请实施例还提供一种芯片，该芯片包括收发单元和处理单元。其中，收发单元可以是输入输出电路、通信接口；处理单元为该芯片上集成的处理器或者微处理器或者集成电路；该芯片可以执行上述方法实施例中的视频风格迁移的方法。Exemplarily, an embodiment of the present application further provides a chip, where the chip includes a transceiver unit and a processing unit. Wherein, the transceiver unit may be an input/output circuit or a communication interface; the processing unit may be a processor, a microprocessor or an integrated circuit integrated on the chip; the chip may execute the video style transfer method in the above method embodiment.

示例性地，本申请实施例还提供一种芯片，该芯片包括收发单元和处理单元。其中，收发单元可以是输入输出电路、通信接口；处理单元为该芯片上集成的处理器或者微处理器或者集成电路；该芯片可以执行上述方法实施例中的风格迁移模型的训练方法。Exemplarily, an embodiment of the present application further provides a chip, where the chip includes a transceiver unit and a processing unit. Wherein, the transceiver unit may be an input/output circuit or a communication interface; the processing unit may be a processor, a microprocessor or an integrated circuit integrated on the chip; the chip may execute the training method of the style transfer model in the above method embodiment.

示例性地，本申请实施例还提供一种计算机可读存储介质，其上存储有指令，该指令被执行时执行上述方法实施例中的视频风格迁移的方法。Exemplarily, an embodiment of the present application further provides a computer-readable storage medium, on which instructions are stored, and when the instructions are executed, the method for video style transfer in the foregoing method embodiments is executed.

示例性地，本申请实施例还提供一种计算机可读存储介质，其上存储有指令，该指令被执行时执行上述方法实施例中的风格迁移模型的训练方法。Exemplarily, the embodiment of the present application further provides a computer-readable storage medium, on which instructions are stored, and when the instructions are executed, the method for training the style transfer model in the above method embodiment is executed.

示例性地，本申请实施例还提供一种包含指令的计算机程序产品，该指令被执行时执行上述方法实施例中的视频风格迁移的方法。Exemplarily, an embodiment of the present application further provides a computer program product including an instruction, and when the instruction is executed, the video style transfer method in the above method embodiment is executed.

示例性地，本申请实施例还提供一种包含指令的计算机程序产品，该指令被执行时执行上述方法实施例中的风格迁移模型的训练方法。Exemplarily, an embodiment of the present application further provides a computer program product including instructions, and when the instructions are executed, the method for training the style transfer model in the above method embodiment is executed.

应理解，本申请实施例中的处理器可以为中央处理单元(central processingunit，CPU)，该处理器还可以是其他通用处理器、数字信号处理器(digital signalprocessor，DSP)、专用集成电路(application specific integrated circuit，ASIC)、现成可编程门阵列(field programmable gate array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that the processor in the embodiment of the present application may be a central processing unit (central processing unit, CPU), and the processor may also be other general processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit (ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.

还应理解，本申请实施例中的存储器可以是易失性存储器或非易失性存储器，或可包括易失性和非易失性存储器两者。其中，非易失性存储器可以是只读存储器(read-only memory，ROM)、可编程只读存储器(programmable ROM，PROM)、可擦除可编程只读存储器(erasable PROM，EPROM)、电可擦除可编程只读存储器(electrically EPROM，EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory，RAM)，其用作外部高速缓存。通过示例性但不是限制性说明，许多形式的随机存取存储器(random accessmemory，RAM)可用，例如静态随机存取存储器(static RAM，SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM，SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM，DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM，ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM，SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM，DR RAM)。It should also be understood that the memory in the embodiments of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories. Among them, the non-volatile memory can be read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically programmable Erases programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, many forms of random access memory (RAM) are available such as static random access memory (static RAM (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (RAM), Access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (synchlink DRAM, SLDRAM) and direct memory bus random access memory (direct rambus RAM, DR RAM).

上述实施例，可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时，上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时，全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中，或者从一个计算机可读存储介质向另一个计算机可读存储介质传输，例如，所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如，软盘、硬盘、磁带)、光介质(例如，DVD)、或者半导体介质。半导体介质可以是固态硬盘。The above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or other arbitrary combinations. When implemented using software, the above-described embodiments may be implemented in whole or in part in the form of computer program products. The computer program product comprises one or more computer instructions or computer programs. When the computer instruction or computer program is loaded or executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center that includes one or more sets of available media. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media. The semiconductor medium may be a solid state drive.

应理解，本文中术语“和/或”，仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况，其中A,B可以是单数或者复数。另外，本文中字符“/”，一般表示前后关联对象是一种“或”的关系，但也可能表示的是一种“和/或”的关系，具体可参考前后文进行理解。It should be understood that the term "and/or" in this article is only an association relationship describing associated objects, indicating that there may be three relationships, for example, A and/or B may mean: A exists alone, and A and B exist at the same time , there are three cases of B alone, where A and B can be singular or plural. In addition, the character "/" in this article generally indicates that the related objects are an "or" relationship, but it may also indicate an "and/or" relationship, which can be understood by referring to the context.

本申请中，“至少一个”是指一个或者多个，“多个”是指两个或两个以上。“以下至少一项(个)”或其类似表达，是指的这些项中的任意组合，包括单项(个)或复数项(个)的任意组合。例如，a,b,或c中的至少一项(个)，可以表示：a,b,c,a-b,a-c,b-c,或a-b-c，其中a,b,c可以是单个，也可以是多个。In this application, "at least one" means one or more, and "multiple" means two or more. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple .

应理解，在本申请的各种实施例中，上述各过程的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should be understood that, in various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, and should not be used in the embodiments of the present application. The implementation process constitutes any limitation.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

在本申请所提供的几个实施例中，应该理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(read-only memory，ROM)、随机存取存储器(random access memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other various media that can store program codes. .

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of the application, but the scope of protection of the application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the application. Should be covered within the protection scope of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.

Claims

1. A method for training a style migration model, comprising:

acquiring training data, wherein the training data comprises N frames of sample content images, sample style images and N frames of synthesized images, the N frames of synthesized images are images obtained by performing image style migration processing on the N frames of sample content images according to the sample style images, and N is an integer greater than or equal to 2;

performing image style migration processing on the N frames of sample content images according to the sample style images through a neural network model to obtain N frames of predicted composite images;

determining parameters of the neural network model according to an image loss function between the N frames of sample content images and the N frames of predicted composite images,

the image loss function comprises a low-rank loss function, the low-rank loss function is used for representing the difference between a first low-rank matrix and a second low-rank matrix, the first low-rank matrix is obtained based on the N-frame sample content image and optical flow information, the second low-rank matrix is obtained based on the N-frame prediction synthesized image and the optical flow information, and the optical flow information is used for representing the position difference of corresponding pixel points between two adjacent frame images in the N-frame sample content image.

2. The training method of claim 1 wherein the image loss function further comprises a residual loss function derived from a difference between the first sample composite image and the second sample composite image,

the first sample synthesized image is an image obtained by performing image style migration processing on the N frames of sample content images through a first model, the second sample synthesized image is an image obtained by performing image style migration processing on the N frames of sample content images through a second model, the first model and the second model are image style migration models trained in advance according to the sample style images, the second model comprises an optical flow module, the first model does not comprise the optical flow module, and the optical flow module is used for determining optical flow information.

3. The training method of claim 2, wherein the first model and the second model are pre-trained teacher models, and the target style migration model is a target student model obtained by training a student model to be trained according to the residual loss function and a knowledge distillation algorithm.

4. The training method of claim 3, wherein the residual loss function is obtained according to the following equation,

wherein L is _res Representing the residual loss function; n (N) _T Representing the secondA model;representing the first model; n (N) _S Representing the student model to be trained; />Representing a pre-trained base model, the pre-trained base model being the same as the network structure of the student model to be trained; x is x ⁱ And representing an ith frame of sample content image included in the N frames of sample content images, wherein i is a positive integer.

5. Training method according to any of the claims 2-4, wherein the image loss function further comprises a perceptual loss function, wherein the perceptual loss function comprises a content loss representing an image content difference between the N-frame predictive composite image and the N-frame sample content image corresponding thereto and a style loss representing an image style difference between the N-frame predictive composite image and the sample style image.

6. The training method of claim 5 wherein said image loss function is obtained by weighting said low rank loss function, said residual loss function, and said perceptual loss function.

7. Training method according to claim 3 or 4, wherein the parameters of the target style migration model are obtained by a number of iterations of a back propagation algorithm based on the image loss function.

8. A method of video style migration, comprising:

acquiring a video to be processed, wherein the video to be processed comprises N frames of content images to be processed, and N is an integer greater than or equal to 2;

performing image style migration processing on the N frames of content images to be processed according to a target style migration model to obtain N frames of synthesized images;

obtaining a video after style migration processing corresponding to the video to be processed according to the N frames of synthesized images,

the parameters of the target style migration model are determined according to an image loss function of performing style migration processing on N frames of sample content images by the target style migration model, the image loss function comprises a low-rank loss function, the low-rank loss function is used for representing the difference between a first low-rank matrix and a second low-rank matrix, the first low-rank matrix is obtained based on the N frames of sample content images and optical flow information, the second low-rank matrix is obtained based on N frames of predicted synthesized images and the optical flow information, the optical flow information is used for representing the position difference of corresponding pixel points between two adjacent frames of images in the N frames of sample content images, and the N frames of predicted synthesized images are images obtained after performing image style migration processing on the N frames of sample content images according to the sample style images by the target style migration model.

9. The method of claim 8, wherein the image loss function further comprises a residual loss function derived from a difference between the first sample composite image and the second sample composite image,

10. The method of claim 9, wherein the first model and the second model are pre-trained teacher models, and the target style migration model is a target student model obtained by training a student model to be trained according to the residual loss function and a knowledge distillation algorithm.

11. The method of claim 10, wherein the residual loss function is derived from the equation,

wherein L is _res Representing the residual loss function; n (N) _T Representing the second model;representing the first model; n (N) _S Representing the student model to be trained; />Representing a pre-trained style migration model, wherein the pre-trained style migration model has the same network structure as the student model to be trained; x is x ⁱ And representing an ith frame of sample content image included in the N frames of sample content images, wherein i is a positive integer.

12. The method of any of claims 9 to 11, wherein the image loss function further comprises a perceptual loss function, wherein the perceptual loss function comprises a content loss representing an image content difference between the N-frame predicted composite image and the N-frame sample content image corresponding thereto and a style loss representing an image style difference between the N-frame predicted composite image and the sample style image.

13. The method of claim 12, wherein the image loss function is obtained by weighting the low rank loss function, the residual loss function, and the perceptual loss function.

14. The method according to any one of claims 8 to 11, wherein the parameters of the target style migration model are obtained by a plurality of iterations of a back propagation algorithm based on the image loss function.

15. A training device for a style migration model, comprising:

the training device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring training data, the training data comprises N frames of sample content images, sample style images and N frames of synthesized images, the N frames of synthesized images are images obtained by performing image style migration processing on the N frames of sample content images according to the sample style images, and N is an integer greater than or equal to 2;

the processing unit is used for performing image style migration processing on the N frames of sample content images according to the sample style images through a neural network model to obtain N frames of predicted composite images; determining parameters of the neural network model according to an image loss function between the N frames of sample content images and the N frames of predicted composite images,

16. The training device of claim 15 wherein the image loss function further comprises a residual loss function derived from a difference between the first sample composite image and the second sample composite image,

17. The training apparatus of claim 16 wherein the first model and the second model are pre-trained teacher models and the target style migration model is a target student model obtained by training a student model to be trained according to the residual loss function and a knowledge distillation algorithm.

18. The training apparatus of claim 17 wherein said residual loss function is derived from the following equation,

wherein L is _res Representing the residual loss function; n (N) _T Representing the second model;representing the first model; n (N) _S Representing the student model to be trained; />Representing a pre-trained base model, the pre-trained base model being the same as the network structure of the student model to be trained; x is x ⁱ And representing an ith frame of sample content image included in the N frames of sample content images, wherein i is a positive integer.

19. The training apparatus of any of claims 16 to 18 wherein the image loss function further comprises a perceptual loss function, wherein the perceptual loss function comprises a content loss representing an image content difference between the N-frame predicted composite image and the N-frame sample content image corresponding thereto and a style loss representing an image style difference between the N-frame predicted composite image and the sample style image.

20. The training apparatus of claim 19 wherein the image loss function is obtained by weighting the low rank loss function, the residual loss function, and the perceptual loss function.

21. Training apparatus according to claim 17 or 18, wherein the parameters of the target style migration model are derived by a number of iterations of a back propagation algorithm based on the image loss function.

22. An apparatus for video style migration, comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a video to be processed, the video to be processed comprises N frames of content images to be processed, and N is an integer greater than or equal to 2;

the processing unit is used for carrying out image style migration processing on the N frames of content images to be processed according to the target style migration model to obtain N frames of synthesized images; obtaining a video after style migration processing corresponding to the video to be processed according to the N frames of synthesized images,

23. The apparatus of claim 22, wherein the image loss function further comprises a residual loss function derived from a difference between the first sample composite image and the second sample composite image,

24. The apparatus of claim 23, wherein the first model and the second model are pre-trained teacher models, and the target style migration model is a target student model obtained by training a student model to be trained according to the residual loss function and a knowledge distillation algorithm.

25. The apparatus of claim 24, wherein the residual loss function is derived from the following equation,

26. The apparatus of any of claims 23 to 25, wherein the image loss function further comprises a perceptual loss function, wherein the perceptual loss function comprises a content loss representing an image content difference between the N-frame predicted composite image and the N-frame sample content image corresponding thereto and a style loss representing an image style difference between the N-frame predicted composite image and the sample style image.

27. The apparatus of claim 26, wherein the image loss function is obtained by weighting the low rank loss function, the residual loss function, and the perceptual loss function.

28. The apparatus of any one of claims 22 to 25, wherein the parameters of the target style migration model are derived by a plurality of iterations of a back propagation algorithm based on the image loss function.

29. A training device for a style migration model, comprising a processor and a memory, the memory for storing program instructions, the processor for invoking the program instructions to perform the method of any of claims 1-7 or 8-14.

30. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein program instructions, which when executed by a processor, implement the method of any of claims 1 to 7 or 8 to 14.