[go: up one dir, main page]

CN116486107B - Optical flow calculation method, system, equipment and medium - Google Patents

Optical flow calculation method, system, equipment and medium Download PDF

Info

Publication number
CN116486107B
CN116486107B CN202310735464.6A CN202310735464A CN116486107B CN 116486107 B CN116486107 B CN 116486107B CN 202310735464 A CN202310735464 A CN 202310735464A CN 116486107 B CN116486107 B CN 116486107B
Authority
CN
China
Prior art keywords
image
optical flow
global
motion information
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310735464.6A
Other languages
Chinese (zh)
Other versions
CN116486107A (en
Inventor
王子旭
葛利跃
陈震
张聪炫
卢锋
吕科
胡卫明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang Hangkong University
Original Assignee
Nanchang Hangkong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang Hangkong University filed Critical Nanchang Hangkong University
Priority to CN202310735464.6A priority Critical patent/CN116486107B/en
Publication of CN116486107A publication Critical patent/CN116486107A/en
Application granted granted Critical
Publication of CN116486107B publication Critical patent/CN116486107B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种光流计算方法、系统、设备及介质,涉及光流处理领域。所述方法,包括:获取目标图像;目标图像包括:连续两帧图像,分别为第一图像和第二图像;采用运动特征提取网络提取目标图像的运动特征;根据目标图像的运动特征和特征提取通道数,确定第一图像的特征图和第二图像的特征图,并计算两个特征图的匹配代价体积;采用上下文编码器提取第一图像的上下文特征;基于匹配代价体积和上下文特征,采用全局‑局部循环光流解码器进行循环迭代求解,得到目标图像的光流场;全局‑局部循环光流解码器是基于深度可分离残差块、多层感知机块、深度可分离卷积模块和多头注意力模块构建的。本发明能提高光流估计的准确性和鲁棒性。

The invention discloses an optical flow calculation method, system, equipment and medium, and relates to the field of optical flow processing. The method includes: acquiring a target image; the target image includes: two consecutive frames of images, respectively the first image and the second image; using a motion feature extraction network to extract the motion features of the target image; extracting the motion features and features of the target image The number of channels, determine the feature map of the first image and the feature map of the second image, and calculate the matching cost volume of the two feature maps; use the context encoder to extract the context features of the first image; based on the matching cost volume and context features, use The global-local cyclic optical flow decoder performs loop iterative solution to obtain the optical flow field of the target image; the global-local cyclic optical flow decoder is based on a depth-separable residual block, a multi-layer perceptron block, and a depth-separable convolution module built with a multi-head attention module. The invention can improve the accuracy and robustness of optical flow estimation.

Description

一种光流计算方法、系统、设备及介质Optical flow calculation method, system, device and medium

技术领域technical field

本发明涉及光流处理领域,特别是涉及一种光流计算方法、系统、设备及介质。The present invention relates to the field of optical flow processing, in particular to an optical flow calculation method, system, equipment and medium.

背景技术Background technique

光流是指图像序列中运动目标与场景表面像素点的二维运动矢量,其不仅提供了图像中目标和场景的运动矢量,还携带了丰富的形状与结构信息。因此,光流估计是图像处理与计算机视觉领域的研究热点。在许多高级视觉任务,例如动作识别、视频插值、视频分割、目标跟踪中,都作为一个基础构件提供有价值的运动相关性线索。Optical flow refers to the two-dimensional motion vector of moving objects and pixels on the surface of the scene in the image sequence, which not only provides the motion vectors of the objects and scenes in the image, but also carries rich shape and structure information. Therefore, optical flow estimation is a research hotspot in the field of image processing and computer vision. As a fundamental building block, it provides valuable motion correlation cues in many high-level vision tasks, such as action recognition, video interpolation, video segmentation, and object tracking.

近年来,随着深度学习的兴起,基于卷积神经网络(Convolutional NeuralNetwork,CNN)的光流估计模型取得了很大成功。该类方法先利用数据驱动基于学习的优化策略,采用模型化的特征编码器提取图像特征。然后计算特征图之间所有特征向量的相似性,并将相似性最高的一对特征向量视作匹配点,最后解码连续帧图像间的位移场。由于编码和解码过程要求特征具有足够的分辨力,减少因大位移运动和局部歧义性(遮挡、弱纹理、光照变化等)带来的匹配误差问题,因此,如何高效准确的解码运动特征成为提高光流估计准确性和鲁棒性的关键。然而,现有的深度学习光流计算模型通常采用感受野有限的局部卷积操作进行光流解码,导致模型对图像特征的提取与表达能力不足,进而影响光流估计的整体表现。In recent years, with the rise of deep learning, the optical flow estimation model based on Convolutional Neural Network (CNN) has achieved great success. This type of method first utilizes data-driven learning-based optimization strategies, and uses a modeled feature encoder to extract image features. Then calculate the similarity of all feature vectors between the feature maps, and regard the pair of feature vectors with the highest similarity as matching points, and finally decode the displacement field between consecutive frames of images. Since the encoding and decoding process requires that the features have sufficient resolution to reduce the matching error caused by large displacement motion and local ambiguity (occlusion, weak texture, illumination changes, etc.), how to efficiently and accurately decode motion features has become an improvement. The key to the accuracy and robustness of optical flow estimation. However, the existing deep learning optical flow calculation models usually use local convolution operations with limited receptive fields for optical flow decoding, which leads to insufficient ability of the model to extract and express image features, which in turn affects the overall performance of optical flow estimation.

发明内容Contents of the invention

基于此,本发明实施例提供一种光流计算方法、系统、设备及介质,以提高光流估计的准确性和鲁棒性。Based on this, embodiments of the present invention provide an optical flow calculation method, system, device, and medium, so as to improve the accuracy and robustness of optical flow estimation.

为实现上述目的,本发明实施例提供了如下方案:In order to achieve the above purpose, the embodiment of the present invention provides the following solutions:

一种光流计算方法,包括:A method for calculating optical flow, comprising:

获取目标图像;所述目标图像,包括:第一图像和第二图像;所述第一图像和所述第二图像为连续的两帧图像;Acquiring a target image; the target image includes: a first image and a second image; the first image and the second image are two consecutive frames of images;

采用运动特征提取网络提取所述目标图像的运动特征;所述运动特征提取网络,包括多个不同尺寸的卷积层;Using a motion feature extraction network to extract the motion features of the target image; the motion feature extraction network includes a plurality of convolutional layers of different sizes;

根据所述目标图像的运动特征和特征提取通道数,确定所述第一图像的特征图和所述第二图像的特征图,并计算所述第一图像的特征图和所述第二图像的特征图的匹配代价体积;Determine the feature map of the first image and the feature map of the second image according to the motion feature of the target image and the number of feature extraction channels, and calculate the feature map of the first image and the feature map of the second image The matching cost volume of the feature map;

采用上下文编码器提取所述第一图像的上下文特征;所述上下文编码器的结构与所述运动特征提取网络的结构相同;Using a context encoder to extract the context feature of the first image; the structure of the context encoder is the same as that of the motion feature extraction network;

基于所述匹配代价体积和所述上下文特征,采用全局-局部循环光流解码器进行循环迭代求解,得到所述目标图像的光流场;Based on the matching cost volume and the context feature, a global-local cyclic optical flow decoder is used to iteratively solve the loop to obtain the optical flow field of the target image;

其中,所述全局-局部循环光流解码器,包括:依次连接的局部运动信息编码器、全局运动信息编码器和全局-局部运动信息解码器;所述全局-局部运动信息解码器的输出连接所述局部运动信息编码器的输入;Wherein, the global-local cyclic optical flow decoder includes: a local motion information encoder, a global motion information encoder and a global-local motion information decoder connected in sequence; the output of the global-local motion information decoder is connected an input to the local motion information encoder;

所述局部运动信息编码器和所述全局-局部运动信息解码器均包括:依次连接的深度可分离残差块和多层感知机块;所述全局运动信息编码器,包括:依次连接的深度可分离卷积模块和多头注意力模块;Both the local motion information encoder and the global-local motion information decoder include: sequentially connected depth separable residual blocks and multi-layer perceptron blocks; the global motion information encoder includes: sequentially connected depth Separable convolution module and multi-head attention module;

所述局部运动信息编码器用于根据所述匹配代价体积和上次迭代的残差光流进行编码,得到局部运动特征;The local motion information encoder is used to encode according to the matching cost volume and the residual optical flow of the last iteration to obtain local motion features;

所述全局运动信息编码器用于根据所述局部运动特征和所述上下文特征进行编码,得到全局运动信息;The global motion information encoder is used to encode according to the local motion feature and the context feature to obtain global motion information;

所述全局-局部运动信息解码器用于根据所述局部运动特征、所述全局运动信息和所述上下文特征进行解码,得到当前次迭代的残差光流;最后一次迭代的残差光流用于确定所述目标图像的光流场。The global-local motion information decoder is used to decode according to the local motion feature, the global motion information and the context feature to obtain the residual optical flow of the current iteration; the residual optical flow of the last iteration is used to determine The optical flow field of the target image.

可选地,所述运动特征提取网络,具体包括:依次连接的第一卷积层、卷积残差块和第二卷积层;Optionally, the motion feature extraction network specifically includes: a first convolutional layer, a convolutional residual block, and a second convolutional layer connected in sequence;

所述第一卷积层的卷积核大小为7×7;所述卷积残差块,包括:依次连接的第三卷积层和第四卷积层;所述第二卷积层的卷积核的大小为1×1;所述第三卷积层的卷积核的大小为3×3,步长为2;所述第四卷积层的卷积核的大小为3×3,步长为1。The convolution kernel size of the first convolutional layer is 7×7; the convolutional residual block includes: the third convolutional layer and the fourth convolutional layer connected in sequence; the second convolutional layer The size of the convolution kernel is 1×1; the size of the convolution kernel of the third convolution layer is 3×3, and the step size is 2; the size of the convolution kernel of the fourth convolution layer is 3×3 , with a step size of 1.

可选地,根据所述目标图像的运动特征和特征提取通道数,确定所述第一图像的特征图和所述第二图像的特征图,并计算所述第一图像的特征图和所述第二图像的特征图的匹配代价体积,具体包括:Optionally, according to the motion feature of the target image and the number of feature extraction channels, determine the feature map of the first image and the feature map of the second image, and calculate the feature map of the first image and the The matching cost volume of the feature map of the second image, specifically including:

将所述目标图像的运动特征中前一半运动特征确定为所述第一图像的特征图,将所述目标图像的运动特征中后一半运动特征确定为所述第二图像的特征图;Determining the first half of the motion features of the target image as the feature map of the first image, and determining the second half of the motion features of the target image as the feature map of the second image;

将所述第一图像的特征图和所述第二图像的特征图进行点积相似度运算,得到所述第一图像的特征图和所述第二图像的特征图的匹配代价信息;performing a dot product similarity operation on the feature map of the first image and the feature map of the second image to obtain matching cost information of the feature map of the first image and the feature map of the second image;

采用池化操作对所述匹配代价信息进行下采样,得到所述第一图像的特征图和所述第二图像的特征图的匹配代价体积。The matching cost information is down-sampled by using a pooling operation to obtain a matching cost volume of the feature map of the first image and the feature map of the second image.

可选地,所述深度可分离残差块,具体包括:依次连接的第一深度可分离卷积层、第一激活函数、第二深度可分离卷积层和第二激活函数;Optionally, the depth-separable residual block specifically includes: a first depth-separable convolution layer, a first activation function, a second depth-separable convolution layer, and a second activation function connected in sequence;

所述第一深度可分离卷积层的卷积核大小为7×7;所述第二深度可分离卷积层的连接方式为密集连接且卷积核大小为15×15;所述第一激活函数和所述第二激活函数均为GELU激活函数。The convolution kernel size of the first depth-separable convolution layer is 7×7; the connection mode of the second depth-separable convolution layer is dense connection and the convolution kernel size is 15×15; the first Both the activation function and the second activation function are GELU activation functions.

可选地,所述多层感知机块,具体包括:依次连接的第五卷积层、第三深度可分离卷积层、第三激活函数和第六卷积层;Optionally, the multilayer perceptron block specifically includes: a fifth convolutional layer, a third depthwise separable convolutional layer, a third activation function, and a sixth convolutional layer connected in sequence;

所述第五卷积层和所述第六卷积层的卷积核大小为1×1;所述第三深度可分离卷积层的卷积核大小为3×3;所述第三激活函数为GELU激活函数。The convolution kernel size of the fifth convolution layer and the sixth convolution layer is 1×1; the convolution kernel size of the third depth separable convolution layer is 3×3; the third activation The function is the GELU activation function.

本发明还提供了一种光流计算系统,包括:The present invention also provides an optical flow computing system, including:

图像获取模块,用于获取目标图像;所述目标图像,包括:第一图像和第二图像;所述第一图像和所述第二图像为连续的两帧图像;An image acquisition module, configured to acquire a target image; the target image includes: a first image and a second image; the first image and the second image are continuous two-frame images;

运动特征提取模块,用于采用运动特征提取网络提取所述目标图像的运动特征;所述运动特征提取网络,包括多个不同尺寸的卷积层;The motion feature extraction module is used to extract the motion features of the target image using a motion feature extraction network; the motion feature extraction network includes a plurality of convolutional layers of different sizes;

匹配代价计算模块,用于根据所述目标图像的运动特征和特征提取通道数,确定所述第一图像的特征图和所述第二图像的特征图,并计算所述第一图像的特征图和所述第二图像的特征图的匹配代价体积;A matching cost calculation module, configured to determine the feature map of the first image and the feature map of the second image according to the motion feature of the target image and the number of feature extraction channels, and calculate the feature map of the first image a matching cost volume with the feature map of the second image;

上下文特征提取模块,用于采用上下文编码器提取所述第一图像的上下文特征;所述上下文编码器的结构与所述运动特征提取网络的结构相同;The context feature extraction module is used to extract the context feature of the first image by using a context encoder; the structure of the context encoder is the same as that of the motion feature extraction network;

光流场求解模块,用于基于所述匹配代价体积和所述上下文特征,采用全局-局部循环光流解码器进行循环迭代求解,得到所述目标图像的光流场;The optical flow field solving module is used to obtain the optical flow field of the target image by using a global-local loop optical flow decoder to iteratively solve based on the matching cost volume and the context feature;

其中,所述全局-局部循环光流解码器,包括:依次连接的局部运动信息编码器、全局运动信息编码器和全局-局部运动信息解码器;所述全局-局部运动信息解码器的输出连接所述局部运动信息编码器的输入;Wherein, the global-local cyclic optical flow decoder includes: a local motion information encoder, a global motion information encoder and a global-local motion information decoder connected in sequence; the output of the global-local motion information decoder is connected an input to the local motion information encoder;

所述局部运动信息编码器和所述全局-局部运动信息解码器均包括:依次连接的深度可分离残差块和多层感知机块;所述全局运动信息编码器,包括:依次连接的深度可分离卷积模块和多头注意力模块;Both the local motion information encoder and the global-local motion information decoder include: sequentially connected depth separable residual blocks and multi-layer perceptron blocks; the global motion information encoder includes: sequentially connected depth Separable convolution module and multi-head attention module;

所述局部运动信息编码器用于根据所述匹配代价体积和上次迭代的残差光流进行编码,得到局部运动特征;The local motion information encoder is used to encode according to the matching cost volume and the residual optical flow of the last iteration to obtain local motion features;

所述全局运动信息编码器用于根据所述局部运动特征和所述上下文特征进行编码,得到全局运动信息;The global motion information encoder is used to encode according to the local motion feature and the context feature to obtain global motion information;

所述全局-局部运动信息解码器用于根据所述局部运动特征、所述全局运动信息和所述上下文特征进行解码,得到当前次迭代的残差光流;最后一次迭代的残差光流用于确定所述目标图像的光流场。The global-local motion information decoder is used to decode according to the local motion feature, the global motion information and the context feature to obtain the residual optical flow of the current iteration; the residual optical flow of the last iteration is used to determine The optical flow field of the target image.

本发明还提供了一种电子设备,包括存储器及处理器,所述存储器用于存储计算机程序,所述处理器运行所述计算机程序以使所述电子设备执行上述的光流计算方法。The present invention also provides an electronic device, including a memory and a processor, the memory is used to store a computer program, and the processor runs the computer program to enable the electronic device to execute the above optical flow calculation method.

本发明还提供了一种计算机可读存储介质,其存储有计算机程序,所述计算机程序被处理器执行时实现上述的光流计算方法。The present invention also provides a computer-readable storage medium, which stores a computer program, and implements the above optical flow calculation method when the computer program is executed by a processor.

根据本发明提供的具体实施例,本发明公开了以下技术效果:According to the specific embodiments provided by the invention, the invention discloses the following technical effects:

本发明实施例针对现有的光流估计模型中的特征提取能力不足问题,引入深度可分离残差块以及多层感知机块以增大感受野,并借助深度局部运动信息编码器的局部特性和全局运动信息编码器的全局特性构造全局-局部循环光流解码器,该全局-局部循环光流解码器作为关联全局与局部运动信息的光流估计模型,能提高大位移图像区域和弱纹理区域的光流估计的准确性和鲁棒性。The embodiment of the present invention aims at the problem of insufficient feature extraction ability in the existing optical flow estimation model, introduces a depth separable residual block and a multi-layer perceptron block to increase the receptive field, and utilizes the local characteristics of the depth local motion information encoder and the global characteristics of the global motion information encoder to construct a global-local cyclic optical flow decoder. Accuracy and robustness of optical flow estimation for regions.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the accompanying drawings required in the embodiments. Obviously, the accompanying drawings in the following description are only some of the present invention. Embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without paying creative labor.

图1为本发明实施例提供的光流计算方法的流程图;FIG. 1 is a flowchart of an optical flow calculation method provided by an embodiment of the present invention;

图2为本发明实施例提供的第一图像的示意图;FIG. 2 is a schematic diagram of a first image provided by an embodiment of the present invention;

图3为本发明实施例提供的第二图像的示意图;FIG. 3 is a schematic diagram of a second image provided by an embodiment of the present invention;

图4为本发明实施例提供的全局-局部循环光流解码器的结构示意图;FIG. 4 is a schematic structural diagram of a global-local loop optical flow decoder provided by an embodiment of the present invention;

图5为本发明实施例提供的光流场可视化图像;Fig. 5 is an optical flow field visualization image provided by an embodiment of the present invention;

图6为本发明实施例提供的光流计算系统的结构图。FIG. 6 is a structural diagram of an optical flow computing system provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本发明作进一步详细的说明。In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

实施例一Embodiment one

参见图1,本实施例的光流计算方法,包括:Referring to Fig. 1, the optical flow calculation method of this embodiment includes:

步骤101:获取目标图像;所述目标图像,包括:第一图像和第二图像;所述第一图像和所述第二图像为连续的两帧图像。Step 101: Acquire a target image; the target image includes: a first image and a second image; the first image and the second image are two consecutive frames of images.

本实施例中选择bamboo_3图像序列中连续的第三十帧和第三十一帧图像,并输入该连续的两帧图像,其中,第三十帧图像作为第一图像I1,如图2所示;第三十一帧图像作为第二图像I2,如图3所示。In this embodiment, the 30th and 31st consecutive frames of images in the bamboo_3 image sequence are selected, and the two consecutive frames of images are input, wherein the 30th frame of images is used as the first image I 1 , as shown in FIG. 2 shown; the thirty-first frame image is used as the second image I 2 , as shown in FIG. 3 .

步骤102:采用运动特征提取网络提取所述目标图像的运动特征;所述运动特征提取网络,包括多个不同尺寸的卷积层。Step 102: Using a motion feature extraction network to extract motion features of the target image; the motion feature extraction network includes multiple convolutional layers of different sizes.

具体的,该步骤,首先构建运动特征提取网络:将多个堆叠的连续卷积相结合,对第一图像I1和第二图像I2进行运动特征提取,运动特征提取网络的输入是第一图像I1和第二图像I2堆叠后的堆叠图像F,,H表示输入图像高度,W表示输入图像宽度,输出的是连续两帧图像的运动特征FMSpecifically, in this step, first construct a motion feature extraction network: combine multiple stacked continuous convolutions to perform motion feature extraction on the first image I 1 and the second image I 2 , and the input of the motion feature extraction network is the first a stacked image F after image I 1 and second image I 2 are stacked, , H represents the height of the input image, W represents the width of the input image, and the output is the motion feature F M of two consecutive frames of images.

上述运动特征提取网络,具体包括:依次连接的第一卷积层、卷积残差块和第二卷积层。所述第一卷积层的卷积核大小为7×7;所述卷积残差块,包括:依次连接的第三卷积层和第四卷积层;所述第二卷积层的卷积核的大小为1×1;所述第三卷积层的卷积核的大小为3×3,步长为2;所述第四卷积层的卷积核的大小为3×3,步长为1。The above motion feature extraction network specifically includes: a first convolutional layer, a convolutional residual block and a second convolutional layer connected in sequence. The convolution kernel size of the first convolutional layer is 7×7; the convolutional residual block includes: the third convolutional layer and the fourth convolutional layer connected in sequence; the second convolutional layer The size of the convolution kernel is 1×1; the size of the convolution kernel of the third convolution layer is 3×3, and the step size is 2; the size of the convolution kernel of the fourth convolution layer is 3×3 , with a step size of 1.

该运动特征提取网络分为3个阶段(Stage),分别为Stage 1、Stage 2和Stage 3,Stage 1为1/2分辨率,Stage 2为1/4分辨率,Stage 3为1/8分辨率。首先,通过Stage 1的7×7的第一卷积层以控制下采样和提取特征,再通过Stage 2和Stage 3中连续堆叠的两个3×3的卷积残差块,每通过一个Stage代表将图像下采样一倍,最后,通过一个1×1的第二卷积层调整通道数,输出连续两帧图像的运动特征FM。具体的计算公式如下:The motion feature extraction network is divided into 3 stages (Stage), namely Stage 1, Stage 2 and Stage 3, Stage 1 is 1/2 resolution, Stage 2 is 1/4 resolution, and Stage 3 is 1/8 resolution Rate. First, pass the 7×7 first convolutional layer of Stage 1 to control downsampling and feature extraction, and then pass through two 3×3 convolutional residual blocks stacked consecutively in Stage 2 and Stage 3, each passing through a Stage It means that the image is down-sampled twice, and finally, the number of channels is adjusted through a 1×1 second convolutional layer, and the motion feature F M of two consecutive frames of images is output. The specific calculation formula is as follows:

;

上述公式代表运动特征提取网络提取特征的过程,Conv7×7(·)、Conv1×1(·)分别表示采用7×7的第一卷积层和1×1的第二卷积层对图像进行特征提取;ConvBlock3×3(·)表示采用由一个步长为2、卷积核大小为3×3的第三卷积层以及一个步长为1、卷积核大小为3×3的第四卷积层构成的卷积残差块对图像进行特征提取。The above formula represents the process of feature extraction by the motion feature extraction network. Conv 7×7 ( ) and Conv 1×1 ( ) represent the pair of the first convolutional layer of 7×7 and the second convolutional layer of 1×1 respectively. Image feature extraction; ConvBlock 3×3 (·) means using a third convolutional layer with a step size of 2 and a convolution kernel size of 3×3 and a step size of 1 and a convolution kernel size of 3×3 The convolutional residual block composed of the fourth convolutional layer extracts features from the image.

其中,in,

;

;

f 1表示步长为2、卷积核大小为3×3的第三卷积层对输入的图像x1进行下采样和特征提取后,再通过残差连接和激活函数relu后的输出结果;f表示步长为1、卷积核大小为3×3的第三卷积层对输入的f 1进行下采样和特征提取后,再通过残差连接和激活函数relu后的输出结果。f 1 represents the output result after the third convolutional layer with a step size of 2 and a convolution kernel size of 3×3 performs downsampling and feature extraction on the input image x1, and then passes the residual connection and activation function relu; f Indicates the output result after the third convolution layer with a step size of 1 and a convolution kernel size of 3×3 performs downsampling and feature extraction on the input f 1 , and then passes the residual connection and the activation function relu.

步骤103:根据所述目标图像的运动特征和特征提取通道数,确定所述第一图像的特征图和所述第二图像的特征图,并计算所述第一图像的特征图和所述第二图像的特征图的匹配代价体积。Step 103: Determine the feature map of the first image and the feature map of the second image according to the motion feature of the target image and the number of feature extraction channels, and calculate the feature map of the first image and the feature map of the second image Matching cost volume for feature maps of two images.

该步骤,具体包括:This step specifically includes:

(1)将运动特征FM按特征提取通道数一分为二,运动特征FM中前一半运动特征确定为第一图像的特征图F1,将运动特征FM中后一半运动特征确定为所述第二图像的特征图F2(1) Divide the motion feature FM into two according to the number of feature extraction channels. The first half of the motion features in the motion feature FM are determined as the feature map F 1 of the first image, and the second half of the motion features in the motion feature FM are determined as The feature map F 2 of the second image.

(2)将所述第一图像的特征图和所述第二图像的特征图上的特征向量进行点积相似度运算,从而可以获取两个特征图上所有相关点对之间的匹配代价信息,即得到所述第一图像的特征图和所述第二图像的特征图的匹配代价信息。(2) Perform a dot product similarity operation on the feature map of the first image and the feature vector on the feature map of the second image, so that the matching cost information between all relevant point pairs on the two feature maps can be obtained , that is, the matching cost information of the feature map of the first image and the feature map of the second image is obtained.

(3)采用池化操作对所述匹配代价信息进行下采样,从而将大位移的匹配代价信息转化为小位移的匹配代价信息,得到所述第一图像的特征图和所述第二图像的特征图的匹配代价体积,该匹配代价体积表示多尺度匹配代价金字塔的形式,计算公式如下:(3) Downsampling the matching cost information by using a pooling operation, thereby converting the matching cost information of a large displacement into matching cost information of a small displacement, and obtaining the feature map of the first image and the feature map of the second image The matching cost volume of the feature map. The matching cost volume represents the form of the multi-scale matching cost pyramid. The calculation formula is as follows:

;

其中,Cost表示匹配代价信息;表示矩阵乘法操作;AvgPool表示平均池化操作;表示多尺度匹配代价金字塔的层编号;/>表示将匹配代价信息下采样后得到多尺度匹配代价金字塔的第/>层的匹配代价体积。当/>发生变化时,Cost中每一张特征图的大小将会改变,大小的改变将由平均池化操作的步长来决定。本实施例得到各层的匹配代价体积,以更好的进行出现大位移与小位移时的光流估计。Among them, Cost represents matching cost information; Represents a matrix multiplication operation; AvgPool represents an average pooling operation; Indicates the layer number of the multi-scale matching cost pyramid; /> Indicates that the first multi-scale matching cost pyramid is obtained after downsampling the matching cost information The matching cost volume for the layer. when /> When a change occurs, the size of each feature map in Cost will change, and the size change will be determined by the step size of the average pooling operation. In this embodiment, the matching cost volume of each layer is obtained, so as to better estimate the optical flow when a large displacement or a small displacement occurs.

步骤104:采用上下文编码器提取所述第一图像的上下文特征;所述上下文编码器的结构与所述运动特征提取网络的结构相同。Step 104: Using a context encoder to extract context features of the first image; the structure of the context encoder is the same as that of the motion feature extraction network.

该步骤中的上下文编码器与步骤102中的运动特征提取网络互为平行结构。上下文编码器将多个堆叠的连续卷积相结合,对图像序列中的第一图像I1进行上下文特征提取,上下文编码器输入是第一图像I1,输出的是第一图像的上下文特征FCThe context encoder in this step and the motion feature extraction network in step 102 have a parallel structure. The context encoder combines multiple stacked continuous convolutions to perform context feature extraction on the first image I 1 in the image sequence, the input of the context encoder is the first image I 1 , , the output is the context feature F C of the first image.

具体的,上下文编码器分为3个阶段(Stage),分别为Stage 1、Stage 2和Stage 3,其中Stage 1为1/2分辨率,Stage 2为1/4分辨率,Stage 3为1/8分辨率。首先,通过Stage 1的的7×7的第一卷积层以控制下采样和提取特征,再通过Stage 2和Stage 3中连续堆叠的两个3×3的卷积残差块,每通过一个Stage代表将图像下采样一倍,最后,通过一个1×1的第二卷积层调整通道数,输出第一图像的上下文特征FC。计算公式如下:Specifically, the context encoder is divided into three stages (Stage), namely Stage 1, Stage 2 and Stage 3, where Stage 1 is 1/2 resolution, Stage 2 is 1/4 resolution, and Stage 3 is 1/2 resolution. 8 resolutions. First, pass the 7×7 first convolutional layer of Stage 1 to control downsampling and extract features, and then pass through two 3×3 convolutional residual blocks stacked consecutively in Stage 2 and Stage 3, each passing a Stage means to downsample the image twice, and finally, adjust the number of channels through a 1×1 second convolutional layer, and output the context feature F C of the first image. Calculated as follows:

.

步骤105:基于所述匹配代价体积和所述上下文特征,采用全局-局部循环光流解码器进行循环迭代求解,得到所述目标图像的光流场。Step 105: Based on the matching cost volume and the context feature, a global-local loop optical flow decoder is used to iteratively solve the loop to obtain the optical flow field of the target image.

参见图4,所述全局-局部循环光流解码器,包括:依次连接的局部运动信息编码器、全局运动信息编码器和全局-局部运动信息解码器;所述全局-局部运动信息解码器的输出连接所述局部运动信息编码器的输入。Referring to Fig. 4, the global-local cyclic optical flow decoder includes: a sequentially connected local motion information encoder, a global motion information encoder and a global-local motion information decoder; the global-local motion information decoder The output is connected to the input of the local motion information encoder.

下面结合图4,对上述全局-局部循环光流解码器中的各个部分进行进一步详细说明。Each part of the above-mentioned global-local loop optical flow decoder will be further described in detail below in conjunction with FIG. 4 .

(1)局部运动信息编码器。(1) Local motion information encoder.

所述局部运动信息编码器包括:依次连接的深度可分离残差块和多层感知机块(Multilayer Perceptron,MLP);所述全局运动信息编码器,包括:依次连接的深度可分离卷积模块和多头注意力模块。The local motion information encoder includes: sequentially connected depthwise separable residual blocks and multilayer perceptron blocks (Multilayer Perceptron, MLP); the global motion information encoder includes: sequentially connected depthwise separable convolution modules and a multi-head attention module.

所述深度可分离残差块,具体包括:依次连接的第一深度可分离卷积层、第一激活函数、第二深度可分离卷积层和第二激活函数。所述第一深度可分离卷积层的卷积核大小为7×7;所述第二深度可分离卷积层的连接方式为密集连接且卷积核大小为15×15;所述第一激活函数和所述第二激活函数均为GELU激活函数。本实施例中第二深度可分离卷积层,具体包括:依次连接的卷积核大小为1×1的第七卷积层、卷积核大小为15×15的第八卷积层和卷积核大小为1×1的第九卷积层,其中,在每层卷积操作后会采用GELU激活函数对特征进行激活操作,并且在每层卷积操作后对特征进行残差连接操作。The depth-separable residual block specifically includes: a first depth-separable convolution layer, a first activation function, a second depth-separable convolution layer, and a second activation function connected in sequence. The convolution kernel size of the first depth-separable convolution layer is 7×7; the connection mode of the second depth-separable convolution layer is dense connection and the convolution kernel size is 15×15; the first Both the activation function and the second activation function are GELU activation functions. In this embodiment, the second depth separable convolution layer specifically includes: the seventh convolution layer with a convolution kernel size of 1×1, the eighth convolution layer with a convolution kernel size of 15×15, and convolution The ninth convolutional layer with a kernel size of 1×1, in which the GELU activation function is used to activate the feature after each layer of convolution operation, and the residual connection operation is performed on the feature after each layer of convolution operation.

所述多层感知机块,具体包括:依次连接的第五卷积层、第三深度可分离卷积层、第三激活函数和第六卷积层。所述第五卷积层和所述第六卷积层的卷积核大小为1×1;所述第三深度可分离卷积层的卷积核大小为3×3;所述第三激活函数为GELU激活函数。The multi-layer perceptron block specifically includes: a fifth convolutional layer, a third depthwise separable convolutional layer, a third activation function, and a sixth convolutional layer connected in sequence. The convolution kernel size of the fifth convolution layer and the sixth convolution layer is 1×1; the convolution kernel size of the third depth separable convolution layer is 3×3; the third activation The function is the GELU activation function.

所述局部运动信息编码器用于根据所述匹配代价体积和上次迭代的残差光流进行编码,得到局部运动特征。具体的,局部运动信息编码器的输入是由匹配代价体积中的运动信息与初始光流场构成的运动特征,将这些特征输入到局部运动信息编码器中,进行运动特征的循环迭代编码,最后输出的是本次迭代的局部运动特征,计算公式如下:The local motion information encoder is used for encoding according to the matching cost volume and the residual optical flow of the last iteration to obtain local motion features. Specifically, the input of the local motion information encoder is the motion feature composed of the motion information in the matching cost volume and the initial optical flow field, these features are input into the local motion information encoder, and the circular iterative encoding of the motion feature is performed, and finally The output is the local motion feature of this iteration ,Calculated as follows:

;

其中,表示每次迭代的残差光流,DLP(·)表示深度可分离MLP块;Cat(·)表示接特征图的操作,即将多个分辨率相同的特征图按通道维度进行拼接;/>表示采用1×1的卷积层对输入的/>进行特征提取;F f表示从该次迭代的光流中提取的特征;Fcost表示通过该次迭代的光流从匹配代价体积中索引出的局部运动特征,是根据匹配代价体积得到的;F m表示通过DLP块增强的局部运动特征;/>表示本次迭代的局部运动特征。RDSCBlocks(·)表示深度可分离残差块;MLP(·)表示由卷积实现的多层感知机块;DwConv7×7(x2)表示采用卷积核大小为7×7的第一深度可分离卷积层对输入的图像x2进行特征提取;DwConv15×15,d(·)表示采用卷积核大小为15×15的第二深度可分离卷积层对图像进行特征提取;GELU(·)表示GELU激活函数;Conv1×1(x3)表示采用卷积核大小为1×1的第五卷积层对输入的图像x3进行特征提取;DwConv3×3(·)表示采用卷积核大小为3×3的第三深度可分离卷积层对图像进行特征提取。in, Represents the residual optical flow of each iteration, DLP( ) represents a depth-separable MLP block; Cat( ) represents the operation of connecting feature maps, that is, splicing multiple feature maps with the same resolution according to the channel dimension; /> Indicates that a 1×1 convolutional layer is used to input /> Perform feature extraction; F f represents the feature extracted from the optical flow of this iteration; F cost represents the local motion feature indexed from the matching cost volume through the optical flow of this iteration, which is obtained according to the matching cost volume; F m represents the local motion feature enhanced by the DLP block; /> Indicates the local motion features of this iteration. RDSCBlocks( ) denotes depthwise separable residual blocks; MLP( ) denotes multilayer perceptron blocks implemented by convolution; DwConv 7×7 (x2) denotes the first depthwise separable The separable convolutional layer performs feature extraction on the input image x2; DwConv 15×15,d ( ) indicates that the second depth separable convolutional layer with a convolution kernel size of 15×15 is used to extract features from the image; GELU(· ) indicates the GELU activation function; Conv 1×1 (x3) indicates that the fifth convolutional layer with a convolution kernel size of 1×1 is used to extract features from the input image x3; DwConv 3×3 ( ) indicates that the convolution kernel is used The third depthwise separable convolutional layer of size 3×3 performs feature extraction on the image.

(2)全局运动信息编码器。(2) Global motion information encoder.

所述全局运动信息编码器,包括:依次连接的深度可分离卷积模块和多头注意力模块。The global motion information encoder includes: a sequentially connected depthwise separable convolution module and a multi-head attention module.

所述全局运动信息编码器用于根据所述局部运动特征和所述上下文特征进行编码,得到全局运动信息。具体的,通过构建具有局部位置编码的深度可分离卷积模块和多头注意力模块,将局部运动信息编码器输出的局部运动特征和上下文特征编码器输出的上下文特征FC输入到全局运动信息编码器,最后输出全局运动信息Fg,计算公式如下:The global motion information encoder is configured to perform encoding according to the local motion feature and the context feature to obtain global motion information. Specifically, by constructing a depthwise separable convolution module and a multi-head attention module with local position encoding, the local motion features output by the local motion information encoder and the context feature F C output by the context feature encoder are input to the global motion information encoder, and finally output the global motion information F g , the calculation formula is as follows:

;

全局运动信息编码器对上下文特征FC进行编码后得到query向量qc和key向量kc。i表示query向量中的某一点的横坐标,j表示query向量中的某一点的纵坐标。u表示key向量中的某一点的横坐标,v表示key向量中的某一点的纵坐标。qc(i,j)表示query向量中某一点的特征值,kc(u,v)表示key向量中某一点的特征值。vm表示根据局部运动特征Fl构建的value向量;vm(u,v)表示value向量中某一点的特征值。γ是一个可学习因子;f (·)表示点乘注意力函数。Fl(i,j)表示局部运动特征中某一点的特征值。The global motion information encoder encodes the context feature F C to obtain the query vector q c and the key vector k c . i represents the abscissa of a certain point in the query vector, and j represents the ordinate of a certain point in the query vector. u represents the abscissa of a certain point in the key vector, and v represents the ordinate of a certain point in the key vector. q c (i, j) represents the eigenvalue of a certain point in the query vector, and k c (u, v) represents the eigenvalue of a certain point in the key vector. v m represents the value vector constructed according to the local motion feature F l ; v m (u, v) represents the feature value of a certain point in the value vector. γ is a learnable factor; f(·) denotes the dot-product attention function. F l (i, j) represents the feature value of a certain point in the local motion feature.

(3)全局-局部运动信息解码器。(3) Global-local motion information decoder.

所述全局-局部运动信息解码器的结构与局部运动信息编码器的结构相同,在此不再赘述。The structure of the global-local motion information decoder is the same as that of the local motion information encoder, and will not be repeated here.

所述全局-局部运动信息解码器用于根据所述局部运动特征、所述全局运动信息和所述上下文特征进行解码,得到当前次迭代的残差光流;最后一次迭代的残差光流用于确定所述目标图像的光流场。The global-local motion information decoder is used to decode according to the local motion feature, the global motion information and the context feature to obtain the residual optical flow of the current iteration; the residual optical flow of the last iteration is used to determine The optical flow field of the target image.

具体的,全局-局部运动信息解码器的输入为局部运动信息编码器输出的局部运动特征以及全局运动信息编码器和上下文特征编码器输出的另一部分hidden state特征聚合而成的全局与局部运动信息,输出的是本次迭代的残差光流,计算公式如下:Specifically, the input of the global-local motion information decoder is the local motion feature output by the local motion information encoder and the global and local motion information aggregated by another part of the hidden state features output by the global motion information encoder and the context feature encoder , the output is the residual optical flow of this iteration ,Calculated as follows:

;

Fa表示由全局运动信息、局部运动特征和上下文特征聚合而来的聚合特征;表示本次迭代全局-局部运动特征;/>表示上一次迭代的全局-局部运动特征。F a represents the aggregated feature aggregated from global motion information, local motion features and contextual features; Indicates the global-local motion feature of this iteration; /> Represents the global-local motion features of the previous iteration.

本实施例的全局-局部循环光流解码器,循环迭代优化光流场,共迭代n次,并将最后一次迭代优化后的光流上采样到与输入图像同等分辨率大小,从而得到最终的光流场。其中,在初次迭代时,设置初始光流场f为0,残差光流为0,通过迭代光流,每次迭代更新后的光流场/>,计算出最后一次迭代的残差光流即可确定最终的光流场。最终的光流场可视化图像如图5所示。The global-local cyclic optical flow decoder in this embodiment optimizes the optical flow field cyclically and iteratively for a total of n iterations, and upsamples the optical flow optimized in the last iteration to the same resolution as the input image, so as to obtain the final Optical flow field. Among them, in the first iteration, the initial optical flow field f is set to 0, and the residual optical flow is 0, through iterative optical flow, the optical flow field after each iteration is updated /> , the final optical flow field can be determined by calculating the residual optical flow of the last iteration. The final optical flow field visualization image is shown in Figure 5.

本实施例利用带有大卷积核的深度可分离卷积残差块的局部建模能力和带有局部位置编码Transformer的远距离建模能力来提升对运动特征的捕获能力,捕获更多的像素之间的运动关系,从而优化大位移图像区域以及弱纹理图像区域的光流估计准确度,减少由于局部信息带来的误差,保证了光流估计的可靠性和鲁棒性。This embodiment utilizes the local modeling capability of the depthwise separable convolutional residual block with a large convolution kernel and the long-distance modeling capability with a local position encoding Transformer to improve the ability to capture motion features and capture more The motion relationship between pixels optimizes the accuracy of optical flow estimation in large displacement image areas and weak texture image areas, reduces errors caused by local information, and ensures the reliability and robustness of optical flow estimation.

实施例二Embodiment two

为了执行上述实施例一对应的方法,以实现相应的功能和技术效果,下面提供一种光流计算系统。In order to implement the method corresponding to the first embodiment above to achieve corresponding functions and technical effects, an optical flow computing system is provided below.

参见图6,所述系统,包括:Referring to Figure 6, the system includes:

图像获取模块601,用于获取目标图像;所述目标图像,包括:第一图像和第二图像;所述第一图像和所述第二图像为连续的两帧图像。The image acquisition module 601 is configured to acquire a target image; the target image includes: a first image and a second image; the first image and the second image are two consecutive frames of images.

运动特征提取模块602,用于采用运动特征提取网络提取所述目标图像的运动特征;所述运动特征提取网络,包括多个不同尺寸的卷积层。The motion feature extraction module 602 is configured to use a motion feature extraction network to extract the motion feature of the target image; the motion feature extraction network includes multiple convolutional layers of different sizes.

匹配代价计算模块603,用于根据所述目标图像的运动特征和特征提取通道数,确定所述第一图像的特征图和所述第二图像的特征图,并计算所述第一图像的特征图和所述第二图像的特征图的匹配代价体积。The matching cost calculation module 603 is configured to determine the feature map of the first image and the feature map of the second image according to the motion feature of the target image and the number of feature extraction channels, and calculate the feature map of the first image map and the matching cost volume of the feature map of the second image.

上下文特征提取模块604,用于采用上下文编码器提取所述第一图像的上下文特征;所述上下文编码器的结构与所述运动特征提取网络的结构相同。The context feature extraction module 604 is configured to extract the context feature of the first image by using a context encoder; the structure of the context encoder is the same as that of the motion feature extraction network.

光流场求解模块605,用于基于所述匹配代价体积和所述上下文特征,采用全局-局部循环光流解码器进行循环迭代求解,得到所述目标图像的光流场。The optical flow field solving module 605 is configured to use a global-local loop optical flow decoder to perform loop iterative solution based on the matching cost volume and the context feature, so as to obtain the optical flow field of the target image.

其中,所述全局-局部循环光流解码器,包括:依次连接的局部运动信息编码器、全局运动信息编码器和全局-局部运动信息解码器;所述全局-局部运动信息解码器的输出连接所述局部运动信息编码器的输入。Wherein, the global-local cyclic optical flow decoder includes: a local motion information encoder, a global motion information encoder and a global-local motion information decoder connected in sequence; the output of the global-local motion information decoder is connected Input to the local motion information encoder.

所述局部运动信息编码器和所述全局-局部运动信息解码器均包括:依次连接的深度可分离残差块和多层感知机块;所述全局运动信息编码器,包括:依次连接的深度可分离卷积模块和多头注意力模块。Both the local motion information encoder and the global-local motion information decoder include: sequentially connected depth separable residual blocks and multi-layer perceptron blocks; the global motion information encoder includes: sequentially connected depth Separable convolutional modules and multi-head attention modules.

所述局部运动信息编码器用于根据所述匹配代价体积和上次迭代的残差光流进行编码,得到局部运动特征。所述全局运动信息编码器用于根据所述局部运动特征和所述上下文特征进行编码,得到全局运动信息。所述全局-局部运动信息解码器用于根据所述局部运动特征、所述全局运动信息和所述上下文特征进行解码,得到当前次迭代的残差光流;最后一次迭代的残差光流用于确定所述目标图像的光流场。The local motion information encoder is used for encoding according to the matching cost volume and the residual optical flow of the last iteration to obtain local motion features. The global motion information encoder is configured to perform encoding according to the local motion feature and the context feature to obtain global motion information. The global-local motion information decoder is used to decode according to the local motion feature, the global motion information and the context feature to obtain the residual optical flow of the current iteration; the residual optical flow of the last iteration is used to determine The optical flow field of the target image.

实施例三Embodiment three

本实施例提供一种电子设备,包括存储器及处理器,存储器用于存储计算机程序,处理器运行计算机程序以使电子设备执行实施例一的光流计算方法。This embodiment provides an electronic device, including a memory and a processor, the memory is used to store a computer program, and the processor runs the computer program to enable the electronic device to execute the optical flow calculation method of Embodiment 1.

可选地,上述电子设备可以是服务器。Optionally, the above-mentioned electronic device may be a server.

另外,本发明实施例还提供一种计算机可读存储介质,其存储有计算机程序,该计算机程序被处理器执行时实现实施例一的光流计算方法。In addition, an embodiment of the present invention further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the optical flow calculation method of the first embodiment is implemented.

本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的系统而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other. As for the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for the related information, please refer to the description of the method part.

本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处。综上所述,本说明书内容不应理解为对本发明的限制。In this paper, specific examples have been used to illustrate the principle and implementation of the present invention. The description of the above embodiments is only used to help understand the method of the present invention and its core idea; meanwhile, for those of ordinary skill in the art, according to the present invention Thoughts, there will be changes in specific implementation methods and application ranges. In summary, the contents of this specification should not be construed as limiting the present invention.

Claims (7)

1. An optical flow calculation method, comprising:
acquiring a target image; the target image includes: a first image and a second image; the first image and the second image are two continuous frames of images;
extracting the motion characteristics of the target image by adopting a motion characteristic extraction network; the motion feature extraction network comprises a plurality of convolution layers with different sizes;
determining a feature map of the first image and a feature map of the second image according to the motion feature and the feature extraction channel number of the target image, and calculating the matching cost volume of the feature map of the first image and the feature map of the second image;
extracting a context feature of the first image using a context encoder; the structure of the context encoder is the same as that of the motion feature extraction network;
based on the matching cost volume and the context characteristics, adopting a global-local loop optical flow decoder to carry out loop iteration solution to obtain an optical flow field of the target image;
wherein the global-local loop optical flow decoder comprises: a local motion information encoder, a global motion information encoder and a global-local motion information decoder connected in sequence; the output of the global-local motion information decoder is connected to the input of the local motion information encoder;
the local motion information encoder and the global-local motion information decoder each include: the depth separable residual block and the multi-layer perceptron block are connected in sequence; the global motion information encoder includes: a depth separable convolution module and a multi-head attention module which are connected in sequence;
the local motion information encoder is used for encoding according to the matching cost volume and the residual light stream of the last iteration to obtain local motion characteristics;
the global motion information encoder is used for encoding according to the local motion characteristics and the context characteristics to obtain global motion information;
the global-local motion information decoder is used for decoding according to the local motion characteristics, the global motion information and the context characteristics to obtain a residual light stream of the current iteration; the residual optical flow of the last iteration is used to determine the optical flow field of the target image;
according to the motion characteristics and the characteristic extraction channel number of the target image, determining the characteristic diagram of the first image and the characteristic diagram of the second image, and calculating the matching cost volume of the characteristic diagram of the first image and the characteristic diagram of the second image, wherein the method specifically comprises the following steps:
determining the first half of the motion characteristics of the target image as a characteristic diagram of the first image, and determining the second half of the motion characteristics of the target image as a characteristic diagram of the second image;
performing dot product similarity operation on the feature images of the first image and the feature images of the second image to obtain matching cost information of the feature images of the first image and the feature images of the second image;
and downsampling the matching cost information by adopting pooling operation to obtain the matching cost volumes of the feature map of the first image and the feature map of the second image.
2. The optical flow computing method according to claim 1, characterized in that the motion feature extraction network specifically comprises: the first convolution layer, the convolution residual block and the second convolution layer are sequentially connected;
the convolution kernel size of the first convolution layer is 7×7; the convolution residual block includes: the third convolution layer and the fourth convolution layer are sequentially connected; the size of the convolution kernel of the second convolution layer is 1×1; the size of the convolution kernel of the third convolution layer is 3 multiplied by 3, and the step length is 2; the size of the convolution kernel of the fourth convolution layer is 3×3, and the step size is 1.
3. The optical flow computing method according to claim 1, characterized in that the depth separable residual block comprises: the first depth separable convolution layer, the first activation function, the second depth separable convolution layer and the second activation function are connected in sequence;
the convolution kernel size of the first depth separable convolution layer is 7 x 7; the second depth separable convolution layers are connected in a dense manner, and the convolution kernel size is 15 multiplied by 15; the first activation function and the second activation function are both GELU activation functions.
4. The optical flow computing method according to claim 1, wherein the multi-layer perceptron block specifically comprises: the fifth convolution layer, the third depth separable convolution layer, the third activation function and the sixth convolution layer are sequentially connected;
the convolution kernel sizes of the fifth convolution layer and the sixth convolution layer are 1×1; the convolution kernel size of the third depth separable convolution layer is 3×3; the third activation function is a GELU activation function.
5. An optical flow computing system, comprising:
the image acquisition module is used for acquiring a target image; the target image includes: a first image and a second image; the first image and the second image are two continuous frames of images;
the motion feature extraction module is used for extracting the motion features of the target image by adopting a motion feature extraction network; the motion feature extraction network comprises a plurality of convolution layers with different sizes;
the matching cost calculation module is used for determining a feature map of the first image and a feature map of the second image according to the motion feature and the feature extraction channel number of the target image, and calculating the matching cost volume of the feature map of the first image and the feature map of the second image;
a context feature extraction module for extracting context features of the first image using a context encoder; the structure of the context encoder is the same as that of the motion feature extraction network;
the optical flow field solving module is used for carrying out loop iteration solving by adopting a global-local loop optical flow decoder based on the matching cost volume and the context characteristics to obtain an optical flow field of the target image;
wherein the global-local loop optical flow decoder comprises: a local motion information encoder, a global motion information encoder and a global-local motion information decoder connected in sequence; the output of the global-local motion information decoder is connected to the input of the local motion information encoder;
the local motion information encoder and the global-local motion information decoder each include: the depth separable residual block and the multi-layer perceptron block are connected in sequence; the global motion information encoder includes: a depth separable convolution module and a multi-head attention module which are connected in sequence;
the local motion information encoder is used for encoding according to the matching cost volume and the residual light stream of the last iteration to obtain local motion characteristics;
the global motion information encoder is used for encoding according to the local motion characteristics and the context characteristics to obtain global motion information;
the global-local motion information decoder is used for decoding according to the local motion characteristics, the global motion information and the context characteristics to obtain a residual light stream of the current iteration; the residual optical flow of the last iteration is used to determine the optical flow field of the target image;
according to the motion characteristics and the characteristic extraction channel number of the target image, determining the characteristic diagram of the first image and the characteristic diagram of the second image, and calculating the matching cost volume of the characteristic diagram of the first image and the characteristic diagram of the second image, wherein the method specifically comprises the following steps:
determining the first half of the motion characteristics of the target image as a characteristic diagram of the first image, and determining the second half of the motion characteristics of the target image as a characteristic diagram of the second image;
performing dot product similarity operation on the feature images of the first image and the feature images of the second image to obtain matching cost information of the feature images of the first image and the feature images of the second image;
and downsampling the matching cost information by adopting pooling operation to obtain the matching cost volumes of the feature map of the first image and the feature map of the second image.
6. An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the optical flow calculation method of any one of claims 1 to 4.
7. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the optical flow calculation method according to any one of claims 1 to 4.
CN202310735464.6A 2023-06-21 2023-06-21 Optical flow calculation method, system, equipment and medium Active CN116486107B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310735464.6A CN116486107B (en) 2023-06-21 2023-06-21 Optical flow calculation method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310735464.6A CN116486107B (en) 2023-06-21 2023-06-21 Optical flow calculation method, system, equipment and medium

Publications (2)

Publication Number Publication Date
CN116486107A CN116486107A (en) 2023-07-25
CN116486107B true CN116486107B (en) 2023-09-05

Family

ID=87219922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310735464.6A Active CN116486107B (en) 2023-06-21 2023-06-21 Optical flow calculation method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN116486107B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118381927B (en) * 2024-06-24 2024-08-23 杭州宇泛智能科技股份有限公司 Dynamic point cloud compression method, system, storage medium and device based on multimodal bidirectional loop scene flow

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101713986B1 (en) * 2016-02-17 2017-03-08 한국항공대학교산학협력단 Optical flow estimator for moving object detection and method thereof
CN106973293A (en) * 2017-04-21 2017-07-21 中国科学技术大学 The light field image coding method predicted based on parallax
CN111626308A (en) * 2020-04-22 2020-09-04 上海交通大学 Real-time optical flow estimation method based on lightweight convolutional neural network
WO2021035807A1 (en) * 2019-08-23 2021-03-04 深圳大学 Target tracking method and device fusing optical flow information and siamese framework
CN112686952A (en) * 2020-12-10 2021-04-20 中国科学院深圳先进技术研究院 Image optical flow computing system, method and application
WO2021201438A1 (en) * 2020-04-01 2021-10-07 Samsung Electronics Co., Ltd. System and method for motion warping using multi-exposure frames
CN113554039A (en) * 2021-07-27 2021-10-26 广东工业大学 Optical flow graph generation method and system for dynamic images based on multi-attention mechanism
CN114299105A (en) * 2021-08-04 2022-04-08 腾讯科技(深圳)有限公司 Image processing method, image processing device, computer equipment and storage medium
CN114565880A (en) * 2022-04-28 2022-05-31 武汉大学 Method, system and equipment for detecting counterfeit video based on optical flow tracking
CN114677412A (en) * 2022-03-18 2022-06-28 苏州大学 Method, device and device for optical flow estimation
CN114913196A (en) * 2021-12-28 2022-08-16 天翼数字生活科技有限公司 Attention-based dense optical flow calculation method
CN115018888A (en) * 2022-07-04 2022-09-06 东南大学 Optical flow unsupervised estimation method based on Transformer
CN115170826A (en) * 2022-07-08 2022-10-11 杭州电子科技大学 Local search-based fast optical flow estimation method for small moving target and storage medium
CN115272423A (en) * 2022-09-19 2022-11-01 深圳比特微电子科技有限公司 Method and device for training optical flow estimation model and readable storage medium
CN115690170A (en) * 2022-10-08 2023-02-03 苏州大学 Method and system for self-adaptive optical flow estimation aiming at different-scale targets
CN115731263A (en) * 2022-10-28 2023-03-03 苏州工业园区服务外包职业学院 Optical flow calculation method, system, device and medium fusing shift window attention
CN115830090A (en) * 2022-12-01 2023-03-21 大连理工大学 A self-supervised monocular depth prediction training method based on pixel matching to predict camera pose
CN115861384A (en) * 2023-02-27 2023-03-28 广东工业大学 Optical flow estimation method and system based on generation of countermeasure and attention mechanism
WO2023056730A1 (en) * 2021-10-09 2023-04-13 深圳市中兴微电子技术有限公司 Video image augmentation method, network training method, electronic device and storage medium
CN116091793A (en) * 2023-02-27 2023-05-09 南京邮电大学 Light field significance detection method based on optical flow fusion
CN116205953A (en) * 2023-04-12 2023-06-02 华中科技大学 Optical flow estimation method and device based on hierarchical total-correlation cost body aggregation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11620328B2 (en) * 2020-06-22 2023-04-04 International Business Machines Corporation Speech to media translation

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101713986B1 (en) * 2016-02-17 2017-03-08 한국항공대학교산학협력단 Optical flow estimator for moving object detection and method thereof
CN106973293A (en) * 2017-04-21 2017-07-21 中国科学技术大学 The light field image coding method predicted based on parallax
WO2021035807A1 (en) * 2019-08-23 2021-03-04 深圳大学 Target tracking method and device fusing optical flow information and siamese framework
WO2021201438A1 (en) * 2020-04-01 2021-10-07 Samsung Electronics Co., Ltd. System and method for motion warping using multi-exposure frames
CN111626308A (en) * 2020-04-22 2020-09-04 上海交通大学 Real-time optical flow estimation method based on lightweight convolutional neural network
CN112686952A (en) * 2020-12-10 2021-04-20 中国科学院深圳先进技术研究院 Image optical flow computing system, method and application
CN113554039A (en) * 2021-07-27 2021-10-26 广东工业大学 Optical flow graph generation method and system for dynamic images based on multi-attention mechanism
CN114299105A (en) * 2021-08-04 2022-04-08 腾讯科技(深圳)有限公司 Image processing method, image processing device, computer equipment and storage medium
WO2023056730A1 (en) * 2021-10-09 2023-04-13 深圳市中兴微电子技术有限公司 Video image augmentation method, network training method, electronic device and storage medium
CN114913196A (en) * 2021-12-28 2022-08-16 天翼数字生活科技有限公司 Attention-based dense optical flow calculation method
CN114677412A (en) * 2022-03-18 2022-06-28 苏州大学 Method, device and device for optical flow estimation
CN114565880A (en) * 2022-04-28 2022-05-31 武汉大学 Method, system and equipment for detecting counterfeit video based on optical flow tracking
CN115018888A (en) * 2022-07-04 2022-09-06 东南大学 Optical flow unsupervised estimation method based on Transformer
CN115170826A (en) * 2022-07-08 2022-10-11 杭州电子科技大学 Local search-based fast optical flow estimation method for small moving target and storage medium
CN115272423A (en) * 2022-09-19 2022-11-01 深圳比特微电子科技有限公司 Method and device for training optical flow estimation model and readable storage medium
CN115690170A (en) * 2022-10-08 2023-02-03 苏州大学 Method and system for self-adaptive optical flow estimation aiming at different-scale targets
CN115731263A (en) * 2022-10-28 2023-03-03 苏州工业园区服务外包职业学院 Optical flow calculation method, system, device and medium fusing shift window attention
CN115830090A (en) * 2022-12-01 2023-03-21 大连理工大学 A self-supervised monocular depth prediction training method based on pixel matching to predict camera pose
CN115861384A (en) * 2023-02-27 2023-03-28 广东工业大学 Optical flow estimation method and system based on generation of countermeasure and attention mechanism
CN116091793A (en) * 2023-02-27 2023-05-09 南京邮电大学 Light field significance detection method based on optical flow fusion
CN116205953A (en) * 2023-04-12 2023-06-02 华中科技大学 Optical flow estimation method and device based on hierarchical total-correlation cost body aggregation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SS-SF:Piecewise 3D Scene Flow Estimatin With Semantic Segmentation;Cheng Feng等;《IEEE Access》;全文 *

Also Published As

Publication number Publication date
CN116486107A (en) 2023-07-25

Similar Documents

Publication Publication Date Title
Liu et al. Flownet3d: Learning scene flow in 3d point clouds
CN110782490B (en) Video depth map estimation method and device with space-time consistency
CN110866953B (en) Map construction method and device, and positioning method and device
CN105654492B (en) Robust real-time three-dimensional method for reconstructing based on consumer level camera
CN113870335A (en) Monocular depth estimation method based on multi-scale feature fusion
CN110543841A (en) Pedestrian re-identification method, system, electronic device and medium
CN111354030B (en) Unsupervised Monocular Image Depth Map Generation Method Embedded with SENet Unit
CN114677412A (en) Method, device and device for optical flow estimation
CN115761594B (en) A Computational Method of Optical Flow Based on Global and Local Coupling
CN114005078B (en) Vehicle weight identification method based on double-relation attention mechanism
CN115239763A (en) Planar target tracking method based on central point detection and graph matching
CN114723787A (en) Optical flow calculation method and system
CN113850900A (en) Method and system for recovering depth map based on image and geometric clue in three-dimensional reconstruction
CN111612825A (en) Motion occlusion detection method for image sequences based on optical flow and multi-scale context
CN116486107B (en) Optical flow calculation method, system, equipment and medium
CN115471651A (en) 4D target segmentation method based on point cloud space-time memory network
CN117237656A (en) Decoupling propagation and cascading optimization optical flow estimation method based on confidence guidance
CN116912296A (en) Point cloud registration method based on position-enhanced attention mechanism
CN116912488A (en) Three-dimensional panorama segmentation method and device based on multi-view camera
CN116342675A (en) Real-time monocular depth estimation method, system, electronic equipment and storage medium
CN111626298B (en) Real-time image semantic segmentation device and segmentation method
CN117115855A (en) Human body posture estimation method and system based on multi-scale transducer learning rich visual features
CN114926734A (en) Solid waste detection device and method based on feature aggregation and attention fusion
Zhang et al. TAPIP3D: Tracking Any Point in Persistent 3D Geometry
CN111598841B (en) Example significance detection method based on regularized dense connection feature pyramid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant