CN112862671B

CN112862671B - Video image editing and repairing method, device and storage medium

Info

Publication number: CN112862671B
Application number: CN202110180041.3A
Authority: CN
Inventors: 黄高; 陶辰昕; 宋士吉
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-02-09
Filing date: 2021-02-09
Publication date: 2024-07-19
Anticipated expiration: 2041-02-09
Also published as: CN112862671A

Abstract

A video image editing method comprising: selecting a frame of target frame image and a frame of auxiliary frame image in an input video, and respectively pairing to obtain a plurality of pairing combinations, wherein each pairing combination comprises the target frame image and the frame of auxiliary frame image; extracting a pair of optical flows for each pairing combination respectively, including: reverse optical flow from the target frame image to the auxiliary frame image in the paired set, forward optical flow from the auxiliary frame image to the target frame image in the paired set; obtaining consistency marks of the images according to consistency judgment results of the reverse optical flow and the forward optical flow in each pair of optical flows; obtaining a segmentation mark corresponding to each instance in the target frame image; and respectively determining the coincidence degree of the segmentation marks corresponding to each instance and the coincidence marks, and combining the segmentation marks with the coincidence degrees higher than the preset condition to obtain combined marks, wherein the combined marks are used for marking the result of removing the region to be removed in the target frame image.

Description

Video image editing, repairing method, device and storage medium

技术领域Technical Field

本文涉及但不限于计算机视觉与数字图像处理领域，尤其涉及视频图像编辑、修复的方法、装置及存储介质。This article relates to but is not limited to the fields of computer vision and digital image processing, and in particular to methods, devices and storage media for video image editing and restoration.

背景技术Background technique

随着互联网技术的不断发展以及智能手机的广泛普及，手机摄影已经成为了人们生活中的重要组成部分，图像/视频编辑也成为图像处理领域中一个愈发重要的问题。在图像/视频编辑技术中，图像/视频修复(也可称为物体移除)尤其困难，它的目的是移除图像/视频中的非主体物体，并且填充被遮挡的区域。在一些修复方法中，标注需要移除的区域需要手工完成，这导致了使用上的不便。With the continuous development of Internet technology and the widespread popularity of smart phones, mobile photography has become an important part of people's lives, and image/video editing has become an increasingly important issue in the field of image processing. In image/video editing technology, image/video restoration (also known as object removal) is particularly difficult. Its purpose is to remove non-subject objects in images/videos and fill in occluded areas. In some restoration methods, marking the areas that need to be removed needs to be done manually, which leads to inconvenience in use.

发明内容Summary of the invention

以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.

本申请实施例提供了一种视频图像编辑方法，可以解决在图像/视频修复时，只能依靠手工标注需要移除的区域的问题。The embodiment of the present application provides a video image editing method, which can solve the problem of having to rely on manual marking of areas that need to be removed during image/video repair.

本申请实施例提供了一种视频图像编辑方法，包括：The present application provides a video image editing method, including:

选取输入视频中的一帧目标帧图像与多帧辅助帧图像；Selecting a target frame image and multiple auxiliary frame images from the input video;

将所述目标帧图像与多帧辅助帧图像分别进行配对，得到多个配对组合，每个配对组合包括所述目标帧图像和一帧辅助帧图像；Pairing the target frame image with multiple auxiliary frame images respectively to obtain multiple pairing combinations, each pairing combination including the target frame image and one auxiliary frame image;

对于每个配对组合分别提取一对光流，包括：反向光流和前向光流；所述反向光流为从所述目标帧图像到该配对组合中所述辅助帧图像的光流，所述前向光流为从该配对组合中所述辅助帧图像到所述目标帧图像的光流；For each pairing combination, a pair of optical flows is extracted respectively, including: a reverse optical flow and a forward optical flow; the reverse optical flow is an optical flow from the target frame image to the auxiliary frame image in the pairing combination, and the forward optical flow is an optical flow from the auxiliary frame image in the pairing combination to the target frame image;

根据每一对光流中反向光流和前向光流的一致性判断结果，得到图像的一致性标记；According to the consistency judgment result of the reverse optical flow and the forward optical flow in each pair of optical flows, the consistency mark of the image is obtained;

获取目标帧图像中的每个实例对应的分割标记；Obtain the segmentation mark corresponding to each instance in the target frame image;

分别确定所述每个实例对应的分割标记与一致性标记的重合度，利用所述重合度高于预设条件的分割标记，与所述一致性标记组合得到组合标记，所述组合标记用于标识出所述目标帧图像中去除待移除区域后的结果。Determine the degree of overlap between the segmentation mark and the consistency mark corresponding to each instance respectively, and use the segmentation mark with a degree of overlap higher than a preset condition to combine with the consistency mark to obtain a combined mark, and the combined mark is used to identify the result after removing the area to be removed in the target frame image.

本申请实施例还提供了一种视频图像修复方法，包括：上述的视频图像编辑方法，所述方法还包括：The embodiment of the present application also provides a video image repair method, including: the above-mentioned video image editing method, the method further comprising:

分别将每一帧所述辅助帧图像按照前向光流进行传播，根据所述组合标记选取传播结果中需要的像素，将所有选取的像素在对应位置上取均值作为填充内容填入目标帧图像中，得到修复后的图像。。Each auxiliary frame image is propagated according to the forward optical flow, and the pixels required in the propagation result are selected according to the combined mark, and the average of all the selected pixels at the corresponding position is taken as the filling content to fill the target frame image to obtain the repaired image.

本申请实施例还提供了一种视频图像编辑装置，包括存储器和处理器，所述存储器用于保存进行视频图像编辑的程序；所述处理器用于读取所述进行视频图像编辑的程序，并执行如上所述的视频图像编辑方法。An embodiment of the present application also provides a video image editing device, including a memory and a processor, wherein the memory is used to store a program for video image editing; the processor is used to read the program for video image editing and execute the video image editing method as described above.

本申请实施例还提供了一种视频图像修复装置，包括存储器和处理器，所述存储器用于保存进行视频图像修复的程序；所述处理器用于读取所述进行视频图像修复的程序，并执行如上所述的视频图像修复方法。An embodiment of the present application also provides a video image repair device, including a memory and a processor, wherein the memory is used to store a program for performing video image repair; the processor is used to read the program for performing video image repair and execute the video image repair method as described above.

本申请实施例还提供了一种计算机可读存储介质，存储有计算机可执行指令，所述计算机可执行指令用于执行上述视频图像编辑方法或者视频图像修复方法。The embodiment of the present application also provides a computer-readable storage medium storing computer-executable instructions, wherein the computer-executable instructions are used to execute the above-mentioned video image editing method or video image repair method.

本申请实施例利用视频中的时间序列关系，能够自动识别输入视频中需要移除的非主体物体、自动去除图像中需要移除的区域，从而能够实现图像/视频的自动修复，修复效果好，处理速度快，节省了人力，使用方便。The embodiment of the present application utilizes the time series relationship in the video, and can automatically identify non-subject objects that need to be removed in the input video and automatically remove areas that need to be removed in the image, thereby achieving automatic repair of the image/video with good repair effect, fast processing speed, manpower saving, and easy use.

本申请实施例的其它特征和优点将在随后的说明书中阐述，并且，部分地从说明书中变得显而易见，或者通过实施本申请实施例而了解。本申请实施例的其他优点可通过在说明书以及附图中所描述的方案来实现和获得。Other features and advantages of the embodiments of the present application will be described in the subsequent description, and partly become apparent from the description, or understood by implementing the embodiments of the present application. Other advantages of the embodiments of the present application can be realized and obtained by the schemes described in the description and the drawings.

在阅读并理解了附图和详细描述后，可以明白其他方面。Other aspects will be apparent upon reading and understanding the drawings and detailed description.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

附图用来提供对本申请实施例技术方案的理解，并且构成说明书的一部分，与本申请的实施例一起用于解释本申请的技术方案，并不构成对本申请实施例技术方案的限制。The accompanying drawings are used to provide an understanding of the technical solutions of the embodiments of the present application and constitute a part of the specification. Together with the embodiments of the present application, they are used to explain the technical solutions of the present application and do not constitute a limitation on the technical solutions of the embodiments of the present application.

图1为本申请实施例中视频图像编辑方法的流程图；FIG1 is a flow chart of a video image editing method according to an embodiment of the present application;

图2为本申请实施例中视频图像编辑或者视频图像修复装置的示意图；FIG2 is a schematic diagram of a video image editing or video image repairing device in an embodiment of the present application;

图3为本申请示例中的视频图像编辑、修复方法的流程图。FIG3 is a flow chart of a video image editing and repairing method in an example of the present application.

具体实施方式Detailed ways

本申请描述了多个实施例，但是该描述是示例性的，而不是限制性的，并且对于本领域的普通技术人员来说显而易见的是，在本申请所描述的实施例包含的范围内可以有更多的实施例和实现方案。尽管在附图中示出了许多可能的特征组合，并在具体实施方式中进行了讨论，但是所公开的特征的许多其它组合方式也是可能的。除非特意加以限制的情况以外，任何实施例的任何特征或元件可以与任何其它实施例中的任何其他特征或元件结合使用，或可以替代任何其它实施例中的任何其他特征或元件。The present application describes multiple embodiments, but the description is exemplary rather than restrictive, and it is obvious to those skilled in the art that there may be more embodiments and implementations within the scope of the embodiments described in the present application. Although many possible feature combinations are shown in the drawings and discussed in the specific embodiments, many other combinations of the disclosed features are also possible. Unless specifically limited, any feature or element of any embodiment may be used in combination with any other feature or element in any other embodiment, or may replace any other feature or element in any other embodiment.

本申请包括并设想了与本领域普通技术人员已知的特征和元件的组合。本申请已经公开的实施例、特征和元件也可以与任何常规特征或元件组合，以形成由权利要求限定的独特的发明方案。任何实施例的任何特征或元件也可以与来自其它发明方案的特征或元件组合，以形成另一个由权利要求限定的独特的发明方案。因此，应当理解，在本申请中示出和/或讨论的任何特征可以单独地或以任何适当的组合来实现。因此，除了根据所附权利要求及其等同替换所做的限制以外，实施例不受其它限制。此外，可以在所附权利要求的保护范围内进行各种修改和改变。The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in the present application may also be combined with any conventional features or elements to form a unique invention scheme defined by the claims. Any features or elements of any embodiment may also be combined with features or elements from other invention schemes to form another unique invention scheme defined by the claims. Therefore, it should be understood that any feature shown and/or discussed in the present application may be implemented individually or in any appropriate combination. Therefore, except for the limitations made according to the attached claims and their equivalents, the embodiments are not subject to other restrictions. In addition, various modifications and changes may be made within the scope of protection of the attached claims.

此外，在描述具有代表性的实施例时，说明书可能已经将方法和/或过程呈现为特定的步骤序列。然而，在该方法或过程不依赖于本文所述步骤的特定顺序的程度上，该方法或过程不应限于所述的特定顺序的步骤。如本领域普通技术人员将理解的，其它的步骤顺序也是可能的。因此，说明书中阐述的步骤的特定顺序不应被解释为对权利要求的限制。此外，针对该方法和/或过程的权利要求不应限于按照所写顺序执行它们的步骤，本领域技术人员可以容易地理解，这些顺序可以变化，并且仍然保持在本申请实施例的精神和范围内。In addition, when describing representative embodiments, the specification may have presented the method and/or process as a specific sequence of steps. However, to the extent that the method or process does not rely on the specific order of the steps described herein, the method or process should not be limited to the steps of the specific order described. As will be understood by those of ordinary skill in the art, other sequences of steps are also possible. Therefore, the specific sequence of the steps set forth in the specification should not be interpreted as a limitation to the claims. In addition, the claims for the method and/or process should not be limited to the steps of performing them in the order written, and those skilled in the art can easily understand that these sequences can be changed and still remain within the spirit and scope of the embodiments of the present application.

在进行图像/视频修复时，根据修复对象的不同可以分为图像修复与视频修复两类：When performing image/video restoration, it can be divided into two categories: image restoration and video restoration according to the different restoration objects:

图像修复中常用的方法是基于补丁的方法：它在图像中寻找与破损区域边缘最为相似的像素块，然后通过一系列变换将相似的像素块中的像素值填充到破损区域。这种方法一般是通过不断迭代优化来完成修复，因此速度非常慢。在深度学习方法被应用到该领域后，Encoder-Decoder网络结构被广泛使用，该网络以原始图像作为输入，直接输出修复完成的图像，在速度上有了巨大的提升。不过，本申请发明人在实践中敏锐的发现：单纯利用图像的信息无法精确还原需要修复的区域。The commonly used method in image restoration is the patch-based method: it searches for the pixel block in the image that is most similar to the edge of the damaged area, and then fills the pixel values in the similar pixel block into the damaged area through a series of transformations. This method generally completes the repair through continuous iterative optimization, so the speed is very slow. After deep learning methods were applied to this field, the Encoder-Decoder network structure was widely used. The network takes the original image as input and directly outputs the repaired image, which has greatly improved the speed. However, the inventors of this application have keenly discovered in practice that it is impossible to accurately restore the area that needs to be repaired using only image information.

视频修复中常用的方法也是基于补丁的方法，最初人们只是简单地对视频中每一帧独立使用图像修复的算法，而这并没有利用到视频中的时间序列关系。更加通用的做法是在视频中寻找相似的空间-时间像素块，然后将这些像素块填补到空白区域。但是和图像修复的方法类似，这种优化的方法也存在耗时较长的弊端。另外一种常用的方法是基于光流的方法，它通过计算出视频中相邻两帧之间的光流，利用光流信息将视频中完好的部分直接传递到破损的空洞处，如果使用深度网络来估计光流，该方法能够在达到更好的效果同时有更高的速度，然而这种方法也不能够精确还原需要修复的区域。The commonly used method in video restoration is also based on the patch method. Initially, people simply used the image restoration algorithm independently for each frame in the video, which did not take advantage of the temporal series relationship in the video. A more common approach is to find similar space-time pixel blocks in the video and then fill these pixel blocks into the blank area. However, similar to the image restoration method, this optimization method also has the disadvantage of being time-consuming. Another commonly used method is the optical flow-based method, which calculates the optical flow between two adjacent frames in the video and uses the optical flow information to transfer the intact part of the video directly to the damaged hole. If a deep network is used to estimate the optical flow, this method can achieve better results and higher speed. However, this method cannot accurately restore the area that needs to be repaired.

并且，本申请发明人在实践中还发现：上述这些修复方法都是基于图像/视频中需要修复的区域已经被标注出来的假设上完成的，然而标注需要移除的区域依然需要手工完成，这在实际应用中是不可行的。Furthermore, the inventors of the present application have also found in practice that the above-mentioned repair methods are all based on the assumption that the areas to be repaired in the image/video have been marked, but marking the areas to be removed still needs to be done manually, which is not feasible in practical applications.

如图1所示，本申请实施例提供了一种视频图像编辑方法，包括：As shown in FIG1 , the embodiment of the present application provides a video image editing method, comprising:

S100：选取输入视频中的一帧目标帧图像与多帧辅助帧图像；S100: selecting a target frame image and multiple auxiliary frame images in an input video;

S101：将目标帧图像与多帧辅助帧图像分别进行配对并提取光流；S101: Pairing the target frame image with multiple auxiliary frame images respectively and extracting optical flows;

该步骤包括：将目标帧图像与多帧辅助帧图像分别进行配对，得到多个配对组合，每个配对组合包括目标帧图像和一帧辅助帧图像；This step includes: pairing the target frame image with multiple auxiliary frame images respectively to obtain multiple pairing combinations, each pairing combination includes the target frame image and one auxiliary frame image;

对于每个配对组合分别提取一对光流，包括：反向光流和前向光流；反向光流为从目标帧图像到该配对组合中辅助帧图像的光流，前向光流为从该配对组合中辅助帧图像到目标帧图像的光流；For each pairing combination, a pair of optical flows is extracted respectively, including: a reverse optical flow and a forward optical flow; the reverse optical flow is the optical flow from the target frame image to the auxiliary frame image in the pairing combination, and the forward optical flow is the optical flow from the auxiliary frame image in the pairing combination to the target frame image;

S102：由每一对光流的一致性判断结果，得到图像的一致性标记；S102: Obtaining a consistency mark of the image based on the consistency judgment result of each pair of optical flows;

S103：获取目标帧图像中的每个实例对应的分割标记；S103: Obtaining a segmentation mark corresponding to each instance in the target frame image;

S104：利用分割标记与一致性标记得到组合标记；该步骤包括：S104: Obtain a combined tag using the segmentation tag and the consistency tag; this step includes:

分别确定每个实例对应的分割标记与一致性标记的重合度，利用重合度高于预设条件的分割标记，与一致性标记组合得到组合标记，组合标记用于标识出目标帧图像中去除待移除区域后的结果。The degree of overlap between the segmentation mark and the consistency mark corresponding to each instance is determined respectively, and the segmentation mark with an overlap higher than a preset condition is combined with the consistency mark to obtain a combined mark. The combined mark is used to identify the result after removing the area to be removed in the target frame image.

在步骤S100中，选取目标帧图像作为主要操作对象，即该目标帧图像是需要进行图像修复的对象，之后的操作都会以该目标帧图像作为基准进行；所选取的辅助帧图像用于与目标帧图像进行对比并提供一定的信息，以辅助帧图像提供的信息为依据，可以对目标帧图像进行编辑及修复。目标帧图像可以任意选取，如可以在视频的第一帧图像、最后一帧图像以及中间任意帧图像中进行选取，辅助帧图像应当选取除目标帧图像以外的图像，并且尽量均匀分布在整个视频序列中，以便能够尽量采集到整个视频的画面内容作参考，为编辑目标帧图像提供更加全面的信息。在需要进行视频修复时，可以将需要修复的每一帧视频画面分别选取为目标帧图像来进行修复。In step S100, the target frame image is selected as the main operation object, that is, the target frame image is the object that needs to be repaired, and the subsequent operations will be performed based on the target frame image; the selected auxiliary frame image is used to compare with the target frame image and provide certain information. Based on the information provided by the auxiliary frame image, the target frame image can be edited and repaired. The target frame image can be selected arbitrarily, such as the first frame image, the last frame image, and any frame image in the middle of the video. The auxiliary frame images should select images other than the target frame image, and should be distributed as evenly as possible in the entire video sequence, so as to capture the entire video content as much as possible for reference, and provide more comprehensive information for editing the target frame image. When video repair is required, each frame of the video that needs to be repaired can be selected as a target frame image for repair.

对上述步骤S101和S103的执行顺序，本申请实施例不作限制，可以根据需要进行选择。The embodiment of the present application does not limit the execution order of the above steps S101 and S103, and can be selected as needed.

在本实施例中，实例对应于图像画面上的实体，可以是图像中出现的每个不同物体，一张图像上可能存在多个实例，比如在某段视频的某帧画面中出现的人物、建筑物、动植物、白云、风筝等，这些均可以看成是该帧画面中的实例。用户可以选择图像画面上的任意实例进行操作。In this embodiment, an instance corresponds to an entity on the image screen, which can be each different object appearing in the image. There may be multiple instances on an image, such as people, buildings, animals, plants, white clouds, kites, etc. appearing in a certain frame of a certain video. These can all be regarded as instances in the frame. The user can select any instance on the image screen to operate.

在步骤S104得到组合标记后，表示已经自动去除了目标帧图像中需要移除的区域，此时可以采用现有的图像修复方法对目标帧图像进行修复，例如：深度流动导引视频修补(deep flow-guided video inpainting)、流边导向视频修复(Flow-edge Guided VideoCompletion)以及动态视频的时间相干修复(Temporally coherent completion ofdynamic video)等方法，在此不再赘述。After the combined mark is obtained in step S104, it means that the area to be removed in the target frame image has been automatically removed. At this time, the target frame image can be repaired by using existing image restoration methods, such as deep flow-guided video inpainting, flow-edge guided video completion, and temporally coherent completion of dynamic video, etc., which will not be repeated here.

根据人们的摄影习惯，拍摄对象的主体在全部视频画面中占据了最多的部分(即从整体来看，拍摄对象的主体出现的画面帧数最多，在视频中出现的时间最长)，是贯穿视频画面始终的，上述实施例的方法由于在输入视频中选取出了多帧辅助帧图像，在步骤S101-S102中，通过比较每一对光流中反向光流和前向光流的一致性，即能够近似的得出目标帧图像与整个视频长度中的画面内容的一致性，再与目标帧图像中的实例对应的分割标记相比较，即能够得出当前的目标帧图像中，哪些实例是非拍摄主体，哪些区域是需要去除的结果，在此基础上保留重合度高于预设条件的分割标记，并与一致性标记相组合，得到的组合标记即为去除待移除区域后的标识结果。According to people's photography habits, the main body of the photographed object occupies the largest part in the entire video screen (that is, from the overall point of view, the main body of the photographed object appears in the largest number of frames and appears in the video for the longest time), and runs through the entire video screen. Since the method of the above embodiment selects multiple auxiliary frame images in the input video, in steps S101-S102, by comparing the consistency of the reverse optical flow and the forward optical flow in each pair of optical flows, it is possible to approximately obtain the consistency of the target frame image with the screen content in the entire video length, and then compare it with the segmentation marks corresponding to the instances in the target frame image, it is possible to obtain which instances in the current target frame image are non-shooting subjects and which areas are the results that need to be removed. On this basis, the segmentation marks with a degree of overlap higher than the preset condition are retained and combined with the consistency marks. The obtained combined marks are the identification results after removing the areas to be removed.

与传统方法相比，本申请实施例的方法拥有更快的运算速度，能够在实际场景中实时地给出计算结果；与其它利用神经网络的方法相比，本申请实施例的方法能够自动计算非主体物体标记，给出鲁棒且完整的标记，因此极大地减少了图像编辑中所需要的人力成本。Compared with traditional methods, the method of the embodiment of the present application has a faster computing speed and can give calculation results in real time in actual scenarios; compared with other methods using neural networks, the method of the embodiment of the present application can automatically calculate non-subject object labels and give robust and complete labels, thereby greatly reducing the manpower cost required in image editing.

一种示例性的实施例中，在上述步骤S100中，当在选取输入视频中的一帧目标帧图像与多帧辅助帧图像时，以目标帧图像为中心，在输入视频中以固定间隔选取出所有图像作为辅助帧图像。In an exemplary embodiment, in the above step S100, when selecting a target frame image and multiple auxiliary frame images in the input video, all images are selected as auxiliary frame images at fixed intervals in the input video with the target frame image as the center.

以目标帧图像选定为输入视频中的第一帧图像为例进行说明，这种情况下，在与目标帧图像依次间隔s，2s，...，ns处选取辅助帧图像，其中，s可以代表帧数，即与目标帧图像每相隔s帧画面选取一帧辅助帧图像，s也可以代表相邻两帧之间的时间间隔，即与目标帧图像每相隔s的时间间隔即选取一帧辅助帧图像，s也可以代表其它类似的意义，本实施例对固定间隔s的意义不作限制。第ns帧图像是视频中所能选到的最后一帧辅助帧图像，即在输入视频中将所有符合该固定间隔的辅助帧图像均选取出来。Take the target frame image as the first frame image in the input video as an example for explanation. In this case, auxiliary frame images are selected at intervals of s, 2s, ..., ns from the target frame image, where s can represent the number of frames, that is, one auxiliary frame image is selected every s frames from the target frame image, s can also represent the time interval between two adjacent frames, that is, one auxiliary frame image is selected every s time interval from the target frame image, and s can also represent other similar meanings. This embodiment does not limit the meaning of the fixed interval s. The nsth frame image is the last auxiliary frame image that can be selected in the video, that is, all auxiliary frame images that meet the fixed interval in the input video are selected.

一方面，为了保证辅助帧图像与目标帧图像之间的画面内容存在较大的差异，s的选择不能过小；另一方面，为了保证辅助帧的数量不会太少，s的选择也不能过大。一种实施方式中，选取s使得辅助帧的帧率为6fps，在其它实施方式中，还可以取3fps～12fps中的任意值，本申请实施例对此不作限制。On the one hand, in order to ensure that there is a large difference in the picture content between the auxiliary frame image and the target frame image, s cannot be too small; on the other hand, in order to ensure that the number of auxiliary frames is not too small, s cannot be too large. In one embodiment, s is selected so that the frame rate of the auxiliary frame is 6fps. In other embodiments, it can also take any value between 3fps and 12fps, and the embodiment of the present application does not limit this.

当选定的目标帧图像是位于输入视频的中间帧数的图像时，可以以输入视频的目标帧图像为中心，向前及向后计算固定间隔并选取辅助帧图像，选取的过程可以参照上面的例子，当目标帧图像选定为输入视频中的最后一帧图像时选择过程也与上面的例子类似，在此不再赘述。When the selected target frame image is an image located in the middle of the input video, the target frame image of the input video can be used as the center, and fixed intervals can be calculated forward and backward to select auxiliary frame images. The selection process can refer to the above example. When the target frame image is selected as the last frame image in the input video, the selection process is similar to the above example and will not be repeated here.

在其它实施方式中，选取辅助帧图像时也可以不采用固定的帧数或时间间隔，如可以采用随机的方式或者指定帧数的方式或者其它的方式进行选取，本申请实施例对此不作限制。In other embodiments, a fixed number of frames or time intervals may not be used when selecting auxiliary frame images. For example, selection may be performed randomly or with a specified number of frames or in other ways, and the embodiments of the present application do not impose any restrictions on this.

在上述步骤S101中，在将目标帧图像与多帧辅助帧图像分别进行配对后，每一帧辅助帧图像均对应于目标帧图像，便于后续进行比较以获取信息。配对组合的数量与所选取的辅助帧图像的数量相同。一种示例性的实施例中，图像的一致性标记包括目标帧图像中每个格点位置的一致性标记；根据目标帧中所包含的格点位置的一致性标记，可以得到目标帧图像的一致性标记。In the above step S101, after pairing the target frame image with multiple auxiliary frame images respectively, each auxiliary frame image corresponds to the target frame image, which is convenient for subsequent comparison to obtain information. The number of pairing combinations is the same as the number of selected auxiliary frame images. In an exemplary embodiment, the consistency mark of the image includes a consistency mark of each grid point position in the target frame image; according to the consistency mark of the grid point position contained in the target frame, the consistency mark of the target frame image can be obtained.

根据每一对光流中反向光流和前向光流的一致性判断结果，得到图像的一致性标记，包括：According to the consistency judgment results of the reverse optical flow and the forward optical flow in each pair of optical flows, the consistency mark of the image is obtained, including:

根据目标帧图像中该格点位置与每个辅助帧图像中对应的格点位置之间的一致性判断结果，得到该格点位置的一致性标记，包括：According to the consistency judgment result between the grid point position in the target frame image and the corresponding grid point position in each auxiliary frame image, a consistency mark of the grid point position is obtained, including:

对于每个配对组合，分别根据该配对组合中目标帧图像的该格点位置与辅助帧图像中对应的格点位置之间的一致性判断结果，得到该配对组合中该格点位置的初始一致性标记；For each pairing combination, an initial consistency mark of the grid point position in the pairing combination is obtained according to the consistency judgment result between the grid point position of the target frame image in the pairing combination and the corresponding grid point position in the auxiliary frame image;

将所有配对组合中该格点位置的初始一致性标记累加，根据累加结果得到该格点位置的一致性标记。The initial consistent marks of the grid point position in all pairing combinations are accumulated, and the consistent mark of the grid point position is obtained according to the accumulated result.

以选取的辅助帧图像数为n进行说明，此时相应的配对组合数也为n，对目标帧图像中每个格点的位置，分别与每一帧辅助帧图像的对应格点位置进行对比后，对目标帧图像的每个格点位置均得到n个初始一致性标记，然后将每个格点位置的这n个初始一致性标记进行累加，累加后每个格点位置只对应一个一致性标记。Assume that the number of auxiliary frame images selected is n. The corresponding number of pairing combinations is also n. After comparing the position of each grid point in the target frame image with the corresponding grid point position of each auxiliary frame image, n initial consistency marks are obtained for each grid point position of the target frame image. Then, these n initial consistency marks of each grid point position are accumulated. After the accumulation, each grid point position corresponds to only one consistency mark.

一种示例性的实施例中，对于每个配对组合，分别根据该配对组合中目标帧图像的该格点位置与辅助帧图像中对应的格点位置之间的一致性判断结果，得到该配对组合中该格点位置的初始一致性标记包括：In an exemplary embodiment, for each pairing combination, obtaining the initial consistency mark of the grid point position in the pairing combination according to the consistency judgment result between the grid point position of the target frame image in the pairing combination and the corresponding grid point position in the auxiliary frame image includes:

分别根据每一对光流获得相应的配对组合中新的格点位置；新的格点位置为根据该配对组合中辅助帧图像中的与目标帧图像的该格点位置所对应的格点位置结合前向光流计算得出；According to each pair of optical flows, a new grid point position in the corresponding pairing combination is obtained; the new grid point position is calculated based on the grid point position in the auxiliary frame image in the pairing combination corresponding to the grid point position in the target frame image in combination with the forward optical flow;

对于每个配对组合，分别根据目标帧图像中该格点位置与该配对组合中新的格点位置之间的一致性判断结果，获取该配对组合中该格点位置的初始一致性标记。For each pairing combination, an initial consistency mark of the grid point position in the pairing combination is obtained according to the consistency judgment result between the grid point position in the target frame image and the new grid point position in the pairing combination.

本实施例中，通过判断目标帧图像的某个格点位置与和该格点位置对应的新格点位置的一致性，能够得到前向光流与反向光流之间的光流一致性标记，也可以使用其它方法来判断前向光流与反向光流之间的一致性，例如神经网络拟合的方法等，本申请实施例对此不做限制。所获得的初始一致性标记可以是图像的形式，即光流一致性标记图，当选取的辅助帧图像数为n时，获得的n个初始一致性标记即是n张光流一致性标记图。上述计算一致性标记是以目标帧图像为基准所进行的计算。In this embodiment, by judging the consistency between a certain grid point position of the target frame image and the new grid point position corresponding to the grid point position, the optical flow consistency mark between the forward optical flow and the reverse optical flow can be obtained. Other methods can also be used to judge the consistency between the forward optical flow and the reverse optical flow, such as the neural network fitting method, etc., which is not limited in this embodiment of the present application. The initial consistency mark obtained can be in the form of an image, that is, an optical flow consistency mark map. When the number of auxiliary frame images selected is n, the n initial consistency marks obtained are n optical flow consistency mark maps. The above-mentioned calculation of the consistency mark is a calculation based on the target frame image.

一种示例性的实施例中，将目标帧图像与辅助帧图像分别进行配对后，以目标帧图像、辅助帧图像两张图像作为输入，通过光流网络提取光流，光流网络输出从目标帧图像到辅助帧图像，与从辅助帧图像到目标帧图像的光流。对于每一对目标帧图像与辅助帧图像，都能得到一对光流。In an exemplary embodiment, after pairing the target frame image and the auxiliary frame image respectively, the target frame image and the auxiliary frame image are used as input, and the optical flow is extracted through the optical flow network, and the optical flow network outputs the optical flow from the target frame image to the auxiliary frame image and from the auxiliary frame image to the target frame image. For each pair of target frame image and auxiliary frame image, a pair of optical flows can be obtained.

一种实施方式中，光流网络可以采用FlowNet2.0。采用该网络能够同时取得较高的精度与较快的速度，能够在较短的时间内完成光流的提取。为了获取目标帧图像与辅助帧图像之间的双向光流，可以首先按照目标帧、辅助帧的顺序进行输入，然后按照相反顺序输入，得到网络输出的两张光流。In one implementation, the optical flow network may use FlowNet2.0. The use of this network can achieve both high accuracy and high speed, and can complete the extraction of optical flow in a shorter time. In order to obtain the bidirectional optical flow between the target frame image and the auxiliary frame image, the target frame and the auxiliary frame can be input first, and then input in the reverse order to obtain two optical flows output by the network.

本申请实施例对提取光流的方式以及采用何种光流网络不作限制。The embodiments of the present application do not limit the method of extracting optical flow and the type of optical flow network used.

一种示例性的实施例中，分别根据每一对光流获得相应的配对组合中新的格点位置，包括：In an exemplary embodiment, obtaining a new grid point position in a corresponding pairing combination according to each pair of optical flows includes:

对于每一对光流，按照反向光流将目标帧图像的该格点位置传播到辅助帧图像，根据目标帧图像的该格点位置在辅助帧图像上对应的格点位置以及前向光流计算出新的格点位置；For each pair of optical flows, the grid point position of the target frame image is propagated to the auxiliary frame image according to the reverse optical flow, and the new grid point position is calculated according to the grid point position corresponding to the grid point position of the target frame image on the auxiliary frame image and the forward optical flow;

对于每个配对组合，分别根据目标帧图像中该格点位置与该配对组合中新的格点位置之间的一致性判断结果，获取该配对组合中该格点位置的初始一致性标记包括：For each pairing combination, respectively according to the consistency judgment result between the grid point position in the target frame image and the new grid point position in the pairing combination, obtaining the initial consistency mark of the grid point position in the pairing combination includes:

对于每个配对组合分别进行如下处理：For each pairing combination, the following processing is performed:

比较目标帧图像中该格点位置，与将该配对组合中新的格点位置按照前向光流回传到目标帧图像上后所在的格点位置之间的差异值，当差异值小于或等于预设的第一阈值时，将该格点位置的初始一致性标记记录为表示一致的值。Compare the grid point position in the target frame image with the difference between the grid point position of the new grid point position in the paired combination after being transmitted back to the target frame image according to the forward optical flow; when the difference is less than or equal to a preset first threshold, record the initial consistency mark of the grid point position as a value indicating consistency.

对于每一对光流，先按照反向光流将目标帧图像的格点位置传播到辅助帧图像，根据目标帧图像的格点位置在辅助帧图像上的位置以及前向光流计算出新的格点位置，再将新的格点位置按照前向光流回传到目标帧图像上，比较新的格点位置与目标帧图像的格点位置的差异值，将差异值小于或等于预设的第一阈值的位置标记为一致。将差异值大于预设的第一阈值的格点位置视为不一致。差异值等于预设的第一阈值的情况可以根据需要进行设置，本实施例对此不作限制。也可以采用其它方式来比较目标帧图像与辅助帧图像对应的格点位置的一致性，本申请实施例对此不做限制。For each pair of optical flows, the grid point positions of the target frame image are first propagated to the auxiliary frame image according to the reverse optical flow, and the new grid point positions are calculated according to the positions of the grid point positions of the target frame image on the auxiliary frame image and the forward optical flow, and then the new grid point positions are propagated back to the target frame image according to the forward optical flow, and the difference values between the new grid point positions and the grid point positions of the target frame image are compared, and the positions where the difference values are less than or equal to the preset first threshold are marked as consistent. The grid point positions where the difference values are greater than the preset first threshold are considered inconsistent. The case where the difference value is equal to the preset first threshold can be set as needed, and this embodiment does not limit this. Other methods can also be used to compare the consistency of the grid point positions corresponding to the target frame image and the auxiliary frame image, and this embodiment of the present application does not limit this.

一种示例性的实施例中，根据目标帧图像的格点位置在辅助帧图像上的位置以及前向光流计算出新的格点位置，包括：In an exemplary embodiment, calculating new grid point positions according to the positions of the grid point positions of the target frame image on the auxiliary frame image and the forward optical flow includes:

将目标帧图像的格点位置在辅助帧图像上的位置的坐标值，与该位置处的前向光流的坐标值相加，得到新的格点位置的坐标。The coordinate value of the position of the grid point of the target frame image on the auxiliary frame image is added to the coordinate value of the forward optical flow at the position to obtain the coordinate of the new grid point position.

一种示例性的实施例中，差异值包括欧式距离。In an exemplary embodiment, the difference value comprises a Euclidean distance.

在将格点位置从目标帧图像传播到辅助帧图像后，可以使用双线性插值的方法估计出每个格点的光流(即新的格点位置的坐标)，在其它实施例中，也可以使用最近邻、三次插值等方法进行计算，本申请实施例对此不作限制。在将新的格点位置回传给目标帧图像之后，以新的格点位置和原始的格点位置的欧式距离是否小于某一特定阈值(即第一阈值)为判据来确定是否一致，该第一阈值决定一致性标记的严格程度，可以根据需要进行设置，在本实施例中可以将第一阈值设置为2。After the grid point positions are propagated from the target frame image to the auxiliary frame image, the optical flow of each grid point (i.e., the coordinates of the new grid point positions) can be estimated using the bilinear interpolation method. In other embodiments, the nearest neighbor, cubic interpolation, and other methods can also be used for calculation, which is not limited in the embodiments of the present application. After the new grid point positions are transmitted back to the target frame image, the consistency is determined based on whether the Euclidean distance between the new grid point position and the original grid point position is less than a certain threshold (i.e., the first threshold). The first threshold determines the strictness of the consistency mark and can be set as needed. In this embodiment, the first threshold can be set to 2.

一种示例性的实施例中，根据累加结果得到该格点位置的一致性标记，包括：In an exemplary embodiment, obtaining a consistency mark of the grid point position according to the accumulation result includes:

将累加结果与目标帧图像的最大光流标记相乘得到第一标记，对第一标记进行阈值截断处理后得到该格点位置的二值化的一致性标记；其中，目标帧图像的最大光流标记为目标帧图像上每个像素点的光流的模长的最大值。The accumulated result is multiplied by the maximum optical flow mark of the target frame image to obtain a first mark, and the first mark is threshold-cut off to obtain a binary consistency mark of the grid position; wherein the maximum optical flow mark of the target frame image is the maximum value of the modulus of the optical flow of each pixel on the target frame image.

一种示例性的实施例中，目标帧图像的最大光流标记通过以下方式计算得到：In an exemplary embodiment, the maximum optical flow mark of the target frame image is calculated in the following way:

对每一对光流中的反向光流，分别计算每个像素点上的光流的模长，取每个像素点上所有光流的模长的最大值作为目标帧图像的最大光流标记。For each pair of reverse optical flows, the modulus of the optical flow at each pixel is calculated respectively, and the maximum modulus of all optical flows at each pixel is taken as the maximum optical flow mark of the target frame image.

当选取的辅助帧图像数为n时，存在n张反向光流图，目标帧图像的每个像素点上的光流的模长也有n个值，对这n个值进行比较，可以得到每个像素点上的最大光流标记。即每一对光流在进行计算后，可以得到一张光流标记图(或光流模长图)，综合所有的光流标记或光流标记图，可以得到一张最大光流标记图。When the number of selected auxiliary frame images is n, there are n reverse optical flow maps, and the modulus length of the optical flow at each pixel of the target frame image also has n values. By comparing these n values, the maximum optical flow mark at each pixel can be obtained. That is, after calculating each pair of optical flows, an optical flow mark map (or optical flow modulus length map) can be obtained. Combining all optical flow marks or optical flow mark maps, a maximum optical flow mark map can be obtained.

一种示例性的实施例中，采用如下公式计算光流的模长：In an exemplary embodiment, the modulus of the optical flow is calculated using the following formula:

其中，l_i表示第i对光流中的反向光流的模长，x_i，y_i分别表示每个像素在横向与纵向上的光流大小(或称光流速度)；Where, l _i represents the modulus of the reverse optical flow in the i-th pair of optical flows, x _i and y _i represent the optical flow size (or optical flow speed) of each pixel in the horizontal and vertical directions respectively;

一种示例性的实施例中，使用如下公式计算最大光流标记：In an exemplary embodiment, the maximum optical flow mark is calculated using the following formula:

L_max＝max(L₁，L₂，...，L_n)，L _max =max (L ₁ , L ₂ ,..., L _n ),

其中，L表示光流模长图，n代表光流模长图的总数。Wherein, L represents the optical flow modulus length map, and n represents the total number of optical flow modulus length maps.

一种示例性的实施例中，在上述步骤S102中，获取目标帧图像中的每个实例对应的分割标记，包括：In an exemplary embodiment, in the above step S102, obtaining a segmentation mark corresponding to each instance in the target frame image includes:

以目标帧图像作为实例分割网络的输入，由实例分割网络提取并输出目标帧图像中所有实例对应的分割标记。The target frame image is used as the input of the instance segmentation network, which extracts and outputs the segmentation labels corresponding to all instances in the target frame image.

当目标帧图像中包含了多个实例时，可以得到多个分割标记，即多张分割标记图。When the target frame image contains multiple instances, multiple segmentation marks, that is, multiple segmentation mark maps, can be obtained.

在本实施例中，实例分割网络采用Mask RCNN，在其它实施例中，还可以采用其它的实例分割网络，该网络以一张图像作为输入，能够输出图像中所有实例对应的分割标记。例如，当拍摄场景以人物为中心，希望去除的是非主体的人物时，可以选取分割标记中对应于“人”一类的分割标记。In this embodiment, the instance segmentation network uses Mask RCNN. In other embodiments, other instance segmentation networks can also be used. The network takes an image as input and can output segmentation marks corresponding to all instances in the image. For example, when the shooting scene is centered on a person and it is desired to remove non-subject persons, the segmentation mark corresponding to the "person" class can be selected from the segmentation marks.

使用深度网络完成光流预测与实例分割能够保证较快的处理速度，使得自动编辑的实时应用成为可能。Using deep networks to perform optical flow prediction and instance segmentation can ensure faster processing speed, making real-time applications of automatic editing possible.

一种示例性的实施例中，进行阈值截断处理，包括：In an exemplary embodiment, performing threshold truncation processing includes:

采用第二阈值进行阈值截断处理，第二阈值设置为第一标记的均值的倍数，第一标记的均值为第一标记与配对组合的数量的商。A second threshold is used to perform threshold truncation processing, where the second threshold is set to be a multiple of the mean of the first marker, and the mean of the first marker is a quotient of the first marker and the number of paired combinations.

也可以采用其它方式来设置第二阈值，本申请实施例对此不作限制。通过调节第二阈值，可以调节二值化的一致性标记的结果，可以根据需要合理地设置第二阈值，以便获得更准确的结果，进而有助于达到更好的图像/视频修复效果。The second threshold may also be set in other ways, which are not limited in the present application. By adjusting the second threshold, the result of the binary consistency mark may be adjusted, and the second threshold may be reasonably set as needed to obtain a more accurate result, thereby helping to achieve a better image/video restoration effect.

一种示例性的实施例中，利用重合度高于预设条件的分割标记，与一致性标记组合得到组合标记，包括：In an exemplary embodiment, a segmentation mark with a degree of overlap higher than a preset condition is combined with a consistency mark to obtain a combined mark, including:

分别判断每个分割标记与二值化的一致性标记的重合度，将重合度高于预设条件的分割标记加入到二值化的一致性标记中，得到组合标记。The degree of coincidence between each segmentation mark and the binary consistency mark is determined respectively, and the segmentation marks with a degree of coincidence higher than a preset condition are added to the binary consistency mark to obtain a combined mark.

一种示例性的实施例中，重合度为每个分割标记与一致性标记的交叠区域的重合度，预设条件为第三阈值，所述第三阈值设置为当前进行判断的分割标记的面积值的倍数。In an exemplary embodiment, the degree of overlap is the degree of overlap of the overlapping area of each segmentation mark and the consistency mark, and the preset condition is a third threshold value, which is set to a multiple of the area value of the segmentation mark currently being judged.

也可以采用其它方式来设置第三阈值，本申请实施例对此不作限制。The third threshold may also be set in other ways, which is not limited in the embodiment of the present application.

一种示例性的实施例中，可以采用以下方法计算组合标记：In an exemplary embodiment, the combined mark may be calculated using the following method:

首先，将所有的初始一致性标记记为通过下式将多张初始一致性标记进行累加：First, all initial consistency marks are recorded as Multiple initial consistency marks are accumulated using the following formula:

然后，将累加的结果与最大光流标记L_max相乘：Then, multiply the accumulated result by the maximum optical flow marker L _max :

m^fc＝m^con·L_max。 ^mfc = ^mcon · _Lmax .

对得到的结果进行阈值截断处理得到二值化的光流一致性标记。The obtained result is subjected to threshold truncation to obtain a binary optical flow consistency mark.

在本实施例中，使用m^fc的像素均值的倍数作为第二阈值，决定倍数的缩放因子定为2，然而本申请实施例对此不作限制。In this embodiment, a multiple of the pixel mean value of m ^fc is used as the second threshold, and a scaling factor for determining the multiple is set to 2, but this embodiment of the present application is not limited to this.

最后，对于目标帧图像的每张分割标记，分别计算该分割标记与二值化的光流一致性标记的交叠区域的大小，并比较其与分割标记的整体大小的关系，如果交叠区域大小大于分割标记的整体面积的某个倍数，则将该分割标记加入二值化的光流一致性标记中，决定倍数的缩放因子设置为0.3，本申请实施例对此不作限制。Finally, for each segmentation mark of the target frame image, the size of the overlapping area between the segmentation mark and the binary optical flow consistency mark is calculated respectively, and compared with the overall size of the segmentation mark. If the size of the overlapping area is larger than a certain multiple of the overall area of the segmentation mark, the segmentation mark is added to the binary optical flow consistency mark, and the scaling factor that determines the multiple is set to 0.3, which is not limited in this embodiment of the present application.

本申请实施例的方法，结合了一致性标记，光流标记，分割标记，能够自动地给出视频中非主体的物体标记，使得图像编辑能够不需要手工干预地完成。由于结合了多种标记，最终生成的组合标记更加鲁棒而且完整。本申请实施例的方法能够应用在手机摄影的自动图像/视频编辑的场景中。The method of the embodiment of the present application combines consistency tags, optical flow tags, and segmentation tags, and can automatically give tags for non-subject objects in the video, so that image editing can be completed without manual intervention. Due to the combination of multiple tags, the final generated combined tag is more robust and complete. The method of the embodiment of the present application can be applied in the scene of automatic image/video editing of mobile phone photography.

本申请实施例还提供一种视频图像修复方法，包括：上述实施例中的视频图像编辑方法，该方法还包括：The embodiment of the present application also provides a video image repair method, including: the video image editing method in the above embodiment, the method further includes:

分别将每一帧辅助帧图像按照前向光流进行传播，根据组合标记选取传播结果中需要的像素，将所有选取的像素在对应位置上取均值作为填充内容填入目标帧图像中，得到修复后的图像。Each auxiliary frame image is propagated according to the forward optical flow, and the required pixels in the propagation result are selected according to the combined mark. The average of all selected pixels at the corresponding positions is taken as the filling content and filled into the target frame image to obtain the repaired image.

该方法在上述步骤S103之后执行。对于每一对目标帧图像与辅助帧图像，将辅助帧图像按照前向光流进行传播，根据步骤S103中计算出的组合标记，如果传播结果的像素在组合标记中则保存，不然则丢弃；最后将所有选择的像素在对应位置上取均值作为填充内容填入目标帧图像中，完成修复。This method is performed after the above step S103. For each pair of target frame image and auxiliary frame image, the auxiliary frame image is propagated according to the forward optical flow, and according to the combined mark calculated in step S103, if the pixel of the propagation result is in the combined mark, it is saved, otherwise it is discarded; finally, the average of all selected pixels at the corresponding position is taken as the filling content to fill in the target frame image to complete the repair.

本申请实施例的修复方法可以自动且精确地消除视频中的非主体物体，并且填充在空间-时间上都一致的内容。利用一致性标记、光流标记与分割标记等多种标记，能够稳定且完整地识别视频中非主体物体；基于光流传播的修复方法能够利用视频中的前后信息，使得修复结果在空间与时间上都有较好的一致性，并在最大程度上减少人工痕迹的出现，保证修复结果完整自然。可以应用于手机摄影的自动图像/视频编辑。The repair method of the embodiment of the present application can automatically and accurately eliminate non-subject objects in the video and fill in content that is consistent in both space and time. Using multiple markers such as consistency markers, optical flow markers, and segmentation markers, non-subject objects in the video can be stably and completely identified; the repair method based on optical flow propagation can use the previous and next information in the video, so that the repair result has good consistency in space and time, and minimizes the appearance of artificial traces to ensure that the repair result is complete and natural. It can be applied to automatic image/video editing of mobile phone photography.

如图2所示，本申请实施例还提供一种视频图像编辑装置，包括存储器和处理器，存储器用于保存进行视频图像编辑的程序；处理器用于读取进行视频图像编辑的程序，并执行上述任一实施例中的视频图像编辑方法。As shown in Figure 2, an embodiment of the present application also provides a video image editing device, including a memory and a processor, the memory is used to store a program for video image editing; the processor is used to read the program for video image editing and execute the video image editing method in any of the above embodiments.

如图2所示，本申请实施例还提供一种视频图像修复装置，包括存储器和处理器，存储器用于保存进行视频图像修复的程序；处理器用于读取进行视频图像修复的程序，并执行上述任一实施例中的视频图像修复方法。As shown in Figure 2, an embodiment of the present application also provides a video image repair device, including a memory and a processor, the memory is used to store a program for video image repair; the processor is used to read the program for video image repair and execute the video image repair method in any of the above embodiments.

本申请实施例还提供了一种计算机可读存储介质，存储有计算机可执行指令，所述计算机可执行指令用于执行上述任一实施例中的视频图像编辑或视频图像修复方法。An embodiment of the present application further provides a computer-readable storage medium storing computer-executable instructions, wherein the computer-executable instructions are used to execute the video image editing or video image repair method in any of the above embodiments.

如图3所示，下面以示例一来说明本申请实施例的视频图像编辑、修复方法。As shown in FIG3 , the video image editing and repairing method according to an embodiment of the present application is described below with reference to Example 1.

示例一：Example 1:

S200选取输入视频中的一帧目标帧图像与多帧辅助帧图像；S200 selects a target frame image and multiple auxiliary frame images in the input video;

S201将一帧目标帧图像与多帧辅助帧图像分别进行配对，对每个配对组合分别提取一对光流；S201 pairs a target frame image with multiple auxiliary frame images respectively, and extracts a pair of optical flows for each pairing combination;

在本示例中，目标帧图像选定为输入视频中的第一帧图像；辅助帧图像按照与目标帧图像间隔s，2s，...，ns的顺序选取，其中，选取s使得辅助帧的帧率为6fps，ns为输入视频中所能取到的最后一帧辅助帧图像。In this example, the target frame image is selected as the first frame image in the input video; the auxiliary frame images are selected in the order of s, 2s, ..., ns with respect to the target frame image, where s is selected so that the frame rate of the auxiliary frame is 6fps, and ns is the last auxiliary frame image that can be obtained in the input video.

在选定图像之后，将目标帧图像与辅助帧图像分别进行配对，以目标帧图像、辅助帧图像两张图像作为输入，通过光流网络FlowNet2.0提取光流，在输入时，先按照目标帧、辅助帧的顺序输入，然后按照相反顺序输入，得到该光流网络输出的从目标帧图像到辅助帧图像，与从辅助帧图像到目标帧图像的两张光流。After selecting the image, the target frame image and the auxiliary frame image are paired respectively. The target frame image and the auxiliary frame image are used as input, and the optical flow is extracted through the optical flow network FlowNet2.0. When inputting, first input in the order of target frame and auxiliary frame, and then input in the reverse order to obtain the two optical flows from the target frame image to the auxiliary frame image and from the auxiliary frame image to the target frame image output by the optical flow network.

即对于每一对目标帧与辅助帧，都能得到一对光流，分别为反向光流和前向光流；反向光流为从目标帧图像到该配对组合中辅助帧图像的光流，前向光流为从该配对组合中辅助帧图像到目标帧图像的光流。That is, for each pair of target frame and auxiliary frame, a pair of optical flows can be obtained, namely, reverse optical flow and forward optical flow; the reverse optical flow is the optical flow from the target frame image to the auxiliary frame image in the paired combination, and the forward optical flow is the optical flow from the auxiliary frame image in the paired combination to the target frame image.

S202分别获取每一对光流的初始一致性标记；S202 respectively obtains the initial consistency mark of each pair of optical flows;

对于每一对光流，首先按照反向光流将后一帧图像的格点位置传播到前一帧图像，使用双线性插值的方法估计出每个格点的光流，并得到新的格点位置，再将新的格点位置回传到后一帧图像上，判断新的格点位置与格点位置的欧式距离是否小于第一阈值，在本示例中将第一阈值设置为2。如果新的格点位置与格点位置的欧式距离大于第一阈值，那么将这些格点位置视为不一致，其它格点位置标记为一致。对于每一对光流，都能计算出一张光流一致性标记图。For each pair of optical flows, first propagate the grid point positions of the next frame image to the previous frame image according to the reverse optical flow, use the bilinear interpolation method to estimate the optical flow of each grid point, and get the new grid point position, then propagate the new grid point position back to the next frame image, and determine whether the Euclidean distance between the new grid point position and the grid point position is less than the first threshold. In this example, the first threshold is set to 2. If the Euclidean distance between the new grid point position and the grid point position is greater than the first threshold, then these grid point positions are considered inconsistent, and the other grid point positions are marked as consistent. For each pair of optical flows, an optical flow consistency marking map can be calculated.

其中，后一帧图像为辅助帧图像，前一帧图像为目标帧图像，这样计算出的一致性标记是以目标帧图像为基准的。Among them, the latter frame image is the auxiliary frame image, and the former frame image is the target frame image, so the calculated consistency mark is based on the target frame image.

S203计算所述目标帧图像的最大光流标记；S203 calculates the maximum optical flow mark of the target frame image;

取每一对光流中的反向光流，分别计算每个像素点上光流的模长，在每个像素点上取所有光流中模长最长的值。综合所有光流信息，可以得到一张最大光流标记图。Take the reverse optical flow in each pair of optical flows, calculate the modulus of the optical flow at each pixel, and take the value with the longest modulus among all optical flows at each pixel. Combining all the optical flow information, we can get a maximum optical flow label map.

对每一张目标帧光流上的每个像素，都有横向与纵向的光流速度x_i，y_i，通过下式计算光流模长：For each pixel on the optical flow of each target frame, there are horizontal and vertical optical flow velocities x _i , y _i , and the optical flow modulus is calculated by the following formula:

其中，l_i代表第i对光流中的反向光流的模长，x_i，y_i分别表示每个像素在横向与纵向上的光流大小(或称光流速度)。Wherein, _li represents the modulus of the reverse optical flow in the i-th pair of optical flows, and _xi and _yi represent the optical flow size (or optical flow speed) of each pixel in the horizontal and vertical directions, respectively.

将所有光流模长图记为L₁，L₂，...，L_n，n代表光流模长图的总数，通过下式计算出每个像素点上模长最长的值：All optical flow modulus length maps are recorded as L ₁ , L ₂ , ..., L _n , where n represents the total number of optical flow modulus length maps, and the longest modulus length value at each pixel is calculated by the following formula:

L_max＝max(L₁，L₂，...，L_n)，L _max =max (L ₁ , L ₂ ,..., L _n ),

其中L_max代表最大光流标记。Where _Lmax represents the maximum optical flow mark.

S204获取目标帧图像中的每个实例对应的分割标记；S204 obtains the segmentation mark corresponding to each instance in the target frame image;

以目标帧图像作为输入，使用实例分割网络Mask RCNN提取每个实例对应的分割标记，实例分割网络输出目标帧中所有实例对应的分割标记。对于一张目标帧图像，可能得到一张或多张实例分割标记图。Taking the target frame image as input, the instance segmentation network Mask RCNN is used to extract the segmentation mark corresponding to each instance, and the instance segmentation network outputs the segmentation mark corresponding to all instances in the target frame. For a target frame image, one or more instance segmentation mark maps may be obtained.

对上述步骤S202-S203的执行顺序，本示例不作限制，可以根据需要进行选择；对步骤S204的执行顺序，可以在步骤S200之后、步骤S205之前的任意时间执行，本示例对此不作限制。This example does not limit the execution order of the above steps S202-S203, which can be selected as needed; the execution order of step S204 can be executed at any time after step S200 and before step S205, and this example does not limit this.

S205计算组合标记；S205 calculates the combined mark;

m^fc＝m^con·L_max。 ^mfc = ^mcon · _Lmax .

对得到的结果进行阈值截断处理得到二值化的光流一致性标记，使用m^fc的像素均值的倍数作为第二阈值，决定倍数的缩放因子定为2。The obtained result is subjected to threshold truncation processing to obtain a binary optical flow consistency mark, and the multiple of the pixel mean of ^mfc is used as the second threshold, and the scaling factor that determines the multiple is set to 2.

最后，对于目标帧图像的每张分割标记，分别计算该分割标记与二值化的光流一致性标记的交叠区域的大小，并比较其与分割标记的整体大小的关系，如果交叠区域大小大于分割标记的整体面积的0.3倍，则将该分割标记加入二值化的光流一致性标记中，将每张分割标记均进行比较后，最终得到组合标记。Finally, for each segmentation mark of the target frame image, the size of the overlapping area between the segmentation mark and the binary optical flow consistency mark is calculated, and compared with the overall size of the segmentation mark. If the size of the overlapping area is greater than 0.3 times the overall area of the segmentation mark, the segmentation mark is added to the binary optical flow consistency mark. After comparing each segmentation mark, a combined mark is finally obtained.

S206进行光流传播，根据组合标记修复目标帧图像。S206 performs optical flow propagation and repairs the target frame image according to the combined mark.

对于每一对目标帧图像与辅助帧图像，步骤S201中已经计算出对应的一对光流，将辅助帧图像按照前向光流进行传播，根据步骤S205中计算出的组合标记，如果传播结果的像素在组合标记中则保存，不然则丢弃；最后将所有选择的像素在对应位置上取均值作为填充内容填入目标帧图像中，完成修复。For each pair of target frame image and auxiliary frame image, a corresponding pair of optical flows has been calculated in step S201. The auxiliary frame image is propagated according to the forward optical flow. According to the combined mark calculated in step S205, if the pixel of the propagation result is in the combined mark, it is saved, otherwise it is discarded; finally, the average of all selected pixels at the corresponding positions is taken as the filling content and filled into the target frame image to complete the repair.

本领域普通技术人员可以理解，上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中，在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分；例如，一个物理组件可以具有多个功能，或者一个功能或步骤可以由若干物理组件合作执行。某些组件或所有组件可以被实施为由处理器，如数字信号处理器或微处理器执行的软件，或者被实施为硬件，或者被实施为集成电路，如专用集成电路。这样的软件可以分布在计算机可读介质上，计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的，术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外，本领域普通技术人员公知的是，通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据，并且可包括任何信息递送介质。It will be appreciated by those skilled in the art that all or some of the steps, systems, and functional modules/units in the methods disclosed above may be implemented as software, firmware, hardware, and appropriate combinations thereof. In hardware implementations, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, a physical component may have multiple functions, or a function or step may be performed by several physical components in cooperation. Some or all components may be implemented as software executed by a processor, such as a digital signal processor or a microprocessor, or implemented as hardware, or implemented as an integrated circuit, such as an application-specific integrated circuit. Such software may be distributed on a computer-readable medium, which may include a computer storage medium (or non-transitory medium) and a communication medium (or temporary medium). As known to those skilled in the art, the term computer storage medium includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storing information (such as computer-readable instructions, data structures, program modules, or other data). Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tapes, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and can be accessed by a computer. In addition, it is well known to those of ordinary skill in the art that communication media typically contain computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media.

Claims

1. A video image editing method, comprising:

Selecting a frame of target frame image and a plurality of frames of auxiliary frame images in an input video;

Pairing the target frame image and the multi-frame auxiliary frame image respectively to obtain a plurality of pairing combinations, wherein each pairing combination comprises the target frame image and one frame of auxiliary frame image;

Extracting a pair of optical flows for each pairing combination respectively, including: reverse optical flow and forward optical flow; the reverse optical flow is the optical flow from the target frame image to the auxiliary frame image in the pairing combination, and the forward optical flow is the optical flow from the auxiliary frame image to the target frame image in the pairing combination;

Obtaining consistency marks of the images according to consistency judgment results of the reverse optical flow and the forward optical flow in each pair of optical flows; the consistency marks of the images comprise consistency marks of the positions of each grid point in the target frame image;

obtaining a segmentation mark corresponding to each instance in the target frame image;

Determining the coincidence degree of the segmentation marks corresponding to each instance and the consistency marks respectively, and combining the segmentation marks with the coincidence degree higher than a preset condition with the consistency marks to obtain a combined mark, wherein the combined mark is used for marking the result of removing the region to be removed from the target frame image;

the step of obtaining the consistency mark of the image according to the consistency judgment result of the reverse optical flow and the forward optical flow in each pair of optical flows comprises the following steps:

For each pairing combination, respectively obtaining an initial consistency mark of the grid point position in the pairing combination according to a consistency judgment result between the grid point position of the target frame image in the pairing combination and the corresponding grid point position in the auxiliary frame image;

And accumulating the initial consistency marks of the grid point positions in all the pairing combinations, and obtaining the consistency marks of the grid point positions according to the accumulation result.

2. The video image editing method according to claim 1, wherein for each pairing combination, obtaining an initial consistency flag of the grid point position in the pairing combination according to a consistency determination result between the grid point position of the target frame image in the pairing combination and the corresponding grid point position in the auxiliary frame image, respectively, includes:

Obtaining new lattice point positions in corresponding pairing combinations according to each pair of light flows respectively; the new lattice point position is calculated according to the combination of the lattice point position corresponding to the lattice point position of the target frame image in the auxiliary frame image in the pairing combination and the forward optical flow;

And for each pairing combination, acquiring an initial consistency mark of the grid point position in the pairing combination according to a consistency judgment result between the grid point position in the target frame image and a new grid point position in the pairing combination.

3. The video image editing method according to claim 2, wherein obtaining new lattice positions in the corresponding pairing combinations from each pair of light streams, respectively, comprises:

For each pair of optical flows, propagating the grid point position of the target frame image to the auxiliary frame image according to the reverse optical flow, and calculating the new grid point position according to the corresponding grid point position of the target frame image on the auxiliary frame image and the forward optical flow;

For each pairing combination, respectively obtaining the initial consistency mark of the grid point position in the pairing combination according to the consistency judgment result between the grid point position in the target frame image and the new grid point position in the pairing combination comprises the following steps:

the following treatments were performed for each pairing combination, respectively:

And comparing the grid point position in the target frame image, and returning the new grid point position in the pairing combination to a difference value between the grid point positions on the target frame image according to the forward optical flow, and recording an initial consistency mark of the grid point position as a value representing consistency when the difference value is smaller than or equal to a preset first threshold value.

4.A video image editing method according to claim 3, wherein the difference value comprises a euclidean distance.

5. The video image editing method according to claim 2, wherein said obtaining the consistency flag of the lattice point position according to the accumulation result comprises:

multiplying the accumulated result with the maximum optical flow mark of the target frame image to obtain a first mark, and carrying out threshold value truncation processing on the first mark to obtain the binarized consistency mark of the grid point position; wherein the maximum optical flow of the target frame image is marked as the maximum value of the modular length of the optical flow of each pixel point on the target frame image.

6. The video image editing method according to claim 5, wherein the maximum optical flow label of the target frame image is calculated by:

And respectively calculating the modulo length of the optical flow on each pixel point for the backward optical flow in each pair of optical flows, and taking the maximum value of the modulo length of all the optical flows on each pixel point as the maximum optical flow mark of the target frame image.

7. The video image editing method according to claim 5, wherein said performing a threshold truncation process includes:

And carrying out threshold truncation processing by adopting a second threshold, wherein the second threshold is set to be a multiple of the average value of the first mark, and the average value of the first mark is the quotient of the number of the first mark and the pairing combination.

8. The video image editing method according to claim 5, wherein said combining the split mark with the overlap ratio higher than a preset condition with the consistency mark to obtain a combined mark comprises:

And respectively judging the coincidence degree of each segmentation mark and the binarized consistency mark, and adding the segmentation mark with the coincidence degree higher than the preset condition into the binarized consistency mark to obtain the combined mark.

9. The video image editing method according to claim 8, wherein the overlap ratio is an overlap ratio of an overlapping region of each of the division marks and the consistency mark, the preset condition is a third threshold value set as a multiple of an area value of the division mark currently being judged.

10. The video image editing method according to claim 1, wherein selecting a target frame image and a multi-frame auxiliary frame image in the input video comprises:

And taking the target frame image as a center, and selecting all images in the input video at fixed intervals as auxiliary frame images.

11. A method for video image restoration, comprising: the video image editing method of any of the preceding claims 1-10, the method further comprising:

and respectively spreading the auxiliary frame images of each frame according to the forward optical flow, selecting the required pixels in the spreading result according to the combined mark, taking the average value of all the selected pixels at the corresponding positions as filling content, and filling the filling content into the target frame image to obtain the repaired image.

12. A video image editing apparatus comprising a memory and a processor, wherein the memory is for storing a program for performing video image editing; the processor is configured to read the program for video image editing and execute the video image editing method according to any one of claims 1 to 10.

13. A video image restoration device comprising a memory and a processor, wherein the memory is used for storing a program for performing video image restoration; the processor is configured to read the program for performing video image restoration and perform the video image restoration method according to claim 11.

14. A computer readable storage medium storing computer executable instructions for performing the method of any one of claims 1-10 or claim 11.