CN105956597A

CN105956597A - Binocular stereo matching method based on convolution neural network

Info

Publication number: CN105956597A
Application number: CN201610296770.4A
Authority: CN
Inventors: 刘云海; 白鹏
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2016-05-04
Filing date: 2016-05-04
Publication date: 2016-09-21

Abstract

本发明公开了一种基于卷积神经网络的双目立体匹配方法。本发明首先采用了两个卷积神经子网络对待匹配图像小块进行特征提取，通过卷积神经网络的自动学习能力，能够自动提取出健壮、多样的特征，避免了传统立体匹配方法的复杂的特征选择和人工抽取特征的过程。然后将它们的输出特征进行级联送入到全连接层进行匹配代价计算，获得了比传统立体匹配方法更好的匹配代价，结合视差的后处理方法，可有效获得高精度的视差图，并具有较好的实时性。The invention discloses a binocular stereo matching method based on a convolutional neural network. The present invention first adopts two convolutional neural sub-networks to extract features of small blocks of images to be matched, through the automatic learning ability of convolutional neural networks, it can automatically extract robust and diverse features, avoiding the complicated The process of feature selection and manual feature extraction. Then their output features are cascaded and sent to the fully connected layer for matching cost calculation, and a better matching cost than the traditional stereo matching method is obtained. Combined with the disparity post-processing method, a high-precision disparity map can be effectively obtained, and It has better real-time performance.

Description

A Binocular Stereo Matching Method Based on Convolutional Neural Network

技术领域technical field

本发明涉及双目立体视觉图像处理技术领域，尤其涉及一种采用卷积神经网络的双目立体匹配方法。The invention relates to the technical field of binocular stereo vision image processing, in particular to a binocular stereo matching method using a convolutional neural network.

背景技术Background technique

自上世纪80年代初，Marr创立了视觉计算理论框架以来，双目立体视觉技术一直是机器视觉领域的一个研究热点，在航空测绘、医学成像、虚拟现实和工业检测等领域得到了广泛的研究。双目立体视觉是基于视差原理并利用成像设备从不同位置获得被测物体的两幅图像，通过计算图像对应点间的位置偏差，来获取物体的三维几何信息的方法。双目立体视觉算法主要包括图像获取、摄像机标定、图像校正、立体匹配和三维重建等五个部分。其中立体匹配是整个算法的核心部分，匹配产生的视差图的优劣直接影响到三维重建的效果。目前，传统的立体匹配的方法主要分为三大类：基于特征的匹配算法、基于局部的匹配算法和基于全局的匹配算法。基于特征的匹配算法得到的是稀疏的视差图，要得到致密的视差图，必须通过插值得到。基于局部的匹配算法运算速度快，但在低纹理和深度不连续区域匹配效果差。基于全局的匹配算法能够得到较高精度的匹配结果，但计算速度慢。Since Marr created the theoretical framework of visual computing in the early 1980s, binocular stereo vision technology has been a research hotspot in the field of machine vision, and has been widely studied in the fields of aerial surveying and mapping, medical imaging, virtual reality and industrial inspection. . Binocular stereo vision is based on the principle of parallax and uses imaging equipment to obtain two images of the measured object from different positions, and obtains the three-dimensional geometric information of the object by calculating the position deviation between the corresponding points of the image. The binocular stereo vision algorithm mainly includes five parts: image acquisition, camera calibration, image correction, stereo matching and 3D reconstruction. Among them, stereo matching is the core part of the whole algorithm, and the quality of the disparity map generated by matching directly affects the effect of 3D reconstruction. At present, traditional stereo matching methods are mainly divided into three categories: feature-based matching algorithms, local-based matching algorithms and global-based matching algorithms. The feature-based matching algorithm obtains a sparse disparity map. To obtain a dense disparity map, it must be obtained through interpolation. The matching algorithm based on the part is fast, but the matching effect is poor in low texture and depth discontinuous areas. The global-based matching algorithm can obtain higher-precision matching results, but the calculation speed is slow.

发明内容Contents of the invention

为了获得高精度的致密视差图和较好的实时性，本发明提供了一种基于卷积神经网络的双目立体匹配方法。In order to obtain a high-precision dense disparity map and better real-time performance, the present invention provides a binocular stereo matching method based on a convolutional neural network.

本发明的目的是通过以下技术方案来实现的：一种基于卷积神经网络的双目立体匹配方法，包括以下步骤：The object of the present invention is achieved by the following technical solutions: a binocular stereo matching method based on convolutional neural network, comprising the following steps:

(1)图像预处理。对带有参考视差图的立体图像对的左图和右图分别做Z-score标准化处理。(1) Image preprocessing. Perform Z-score normalization on the left and right images of the stereo image pair with the reference disparity map.

(2)构造训练实例。从预处理后的左图选取中心在p＝(x,y)、大小为n×n的一小块图像从预处理后的右图选取中心在q＝(x-d,y)、大小为n×n的一小块图像和构成一个训练实例：(2) Construct training examples. Select a small image with a center at p=(x,y) and a size of n×n from the preprocessed left image Select a small image with the center at q=(xd,y) and size n×n from the preprocessed right image and Make a training instance:

对于左图每个知道参考视差值d的位置，抽取一个正确的训练实例和一个错误的训练实例。For each position in the left image where the reference disparity value d is known, a correct training instance and a wrong training instance are extracted.

为了获得一个正确的训练实例，将右小块图像的中心设置在：In order to obtain a correct training instance, the right patch image The center is set at:

q＝(x-d+o_pos,y)q＝(x-d+o _pos ,y)

其中o_pos在[-dataset_pos,dataset_pos]中随机取值，dataset_pos为正整数。Among them, o _pos is randomly selected in [-dataset_pos, dataset_pos], and dataset_pos is a positive integer.

为了获得一个错误的训练实例，将右小块图像的中心设置在：In order to obtain a wrong training instance, the right patch image The center is set at:

q＝(x-d+o_neg,y)q＝(x-d+o _neg ,y)

其中o_neg在[-dataset_neg_low,-dataset_neg_high]或[dataset_neg_low,dataset_neg_high]中随机取值。dataset_neg_low和dataset_neg_high均为正整数。Where o _neg takes a random value in [-dataset_neg_low,-dataset_neg_high] or [dataset_neg_low, dataset_neg_high]. Both dataset_neg_low and dataset_neg_high are positive integers.

(3)构造用于计算匹配代价的卷积神经网络结构。首先构造两个完全一样的子网络，每个子网络由两个卷积层和一个全连接层，每层后面跟随一个ReLU层。然后将两个子网络的输出级联起来，接两个全连接层，每层后面接一个ReLU层，最后一个全连接层跟随一个sigmoid转移函数。对于每一个输入网络的输出表示为 (3) Construct a convolutional neural network structure for calculating the matching cost. First construct two identical sub-networks, each sub-network consists of two convolutional layers and a fully connected layer, each followed by a ReLU layer. Then the output of the two sub-networks is cascaded, followed by two fully connected layers, each layer is followed by a ReLU layer, and the last fully connected layer is followed by a sigmoid transfer function. for each input The output of the network is expressed as

(4)训练网络。根据步骤(2)，每次构造N/2个正确的训练实例和N/2个错误的训练实例，将其用于步骤(3)构造的网络进行有监督的反向传播算法训练，得到训练网络，N为训练集个数。(4) Train the network. According to step (2), N/2 correct training examples and N/2 wrong training examples are constructed each time, and they are used in the network constructed in step (3) for supervised backpropagation algorithm training to obtain training Network, N is the number of training sets.

(5)求视差图。从测试集中取一组立体图像对，进行步骤(1)的预处理。运用步骤(4)训练出来的网络，对于左图中每一个位置p＝(x,y)，计算其与右图在位置q＝(x-d,y)的匹配代价C_CNN(p,d)，其中d∈(0,DISP_MAX)(DISP_MAX表示可能的最大视差值)，得到：(5) Find the disparity map. Take a set of stereo image pairs from the test set and perform the preprocessing of step (1). Using the network trained in step (4), for each position p=(x,y) in the left picture, calculate its matching cost C _CNN (p,d) with the right picture at position q=(xd,y), Where d∈(0,DISP_MAX) (DISP_MAX represents the maximum possible disparity value), get:

对于左图中的每一个位置p＝(x,y)，上式中的匹配代价取最小时的位置d即为所求视差D(p)：For each position p=(x, y) in the left picture, the position d at which the matching cost in the above formula is minimized is the required parallax D(p):

$D D. ((p p)) = = arg arg \underset{d d}{min min} {C C}_{C C N N N N} ((p p,, d d))$

(6)对视差图进行后处理。具体包括以下子步骤：(6) Post-processing the disparity map. Specifically include the following sub-steps:

(6.1)亚像素视差。根据步骤(5)求出的匹配代价构造一条二次曲线，取极值点得到亚像素视差图D_SE(p)：(6.1) Sub-pixel parallax. Construct a quadratic curve according to the matching cost obtained in step (5), and take the extreme points to obtain the sub-pixel disparity map D _SE (p):

${D D.}_{S S E E.} ((p p)) = = d d - - \frac{{C C}_{+ +} - - {C C}_{- -}}{22 (({C C}_{+ +} - - 22 C C + + {C C}_{- -}))},,$

其中d＝D(p)，C_-＝C_CNN(p,d-1)，C＝C_CNN(p,d)，C₊＝C_CNN(p,d+1)；where d=D(p), C ₋ =C _CNN (p,d-1), C=C _CNN (p,d), C ₊ =C _CNN (p,d+1);

(6.2)对亚像素视差图D_SE(p)进行中值滤波和双线性滤波，得到最终的视差图D_final(p)。(6.2) Perform median filtering and bilinear filtering on the sub-pixel disparity map D _SE (p) to obtain the final disparity map D _final (p).

进一步地，所述步骤1中，所述Z-score标准化处理过程具体如下：Further, in the step 1, the Z-score standardization process is specifically as follows:

计算图像X中所有像素值的均值x_average和标准差σ：Compute the mean x _average and standard deviation σ of all pixel values in image X:

${x x}_{a a v v e e r r a a g g e e} = = \frac{11}{W W \times \times H h} \underset{((i i,, j j)) &Element; &Element; W W \times \times H h}{Σ Σ} {x x}_{((i i,, j j))}$

$σ σ = = \sqrt{\frac{11}{W W \times \times H h} \underset{((i i,, j j)) &Element; &Element; W W \times \times H h}{Σ Σ} {(({x x}_{((i i,, j j))} - - {x x}_{a a v v e e r r a a g g e e}))}^{22}}$

其中，W×H为图像X的大小。Among them, W×H is the size of the image X.

对每个像素值进行归一化，得到新图像X′，其像素值为：Normalize each pixel value to get a new image X', whose pixel value is:

${x x}_{((i i,, j j))}^{' '} &LeftArrow; &LeftArrow; \frac{{x x}_{((i i,, j j))} - - {x x}_{a a v v e e r r a a g g e e}}{σ σ} . .$

进一步地，所述步骤4中，所述训练网络的代价函数用二值交叉熵损失函数：Further, in the step 4, the cost function of the training network uses a binary cross-entropy loss function:

$- - \frac{11}{N N} {Σ Σ}_{i i = = 11}^{N N} ((l l o o g g (({p p}_{i i})) {y the y}_{i i} + + l l o o g g ((11 - - {p p}_{i i})) ((11 - - {y the y}_{i i}))))$

其中，N为训练集的个数，y_i为第i个样例的标签值，p_i为第i个样例的预测值。Among them, N is the number of training sets, y _i is the label value of the i-th sample, and p _i is the predicted value of the i-th sample.

进一步地，所述步骤4中，所述训练网络的代价函数中，当第i个实例是正确实例时，标签值为0；当第i个实例是错误实例时，标签值为1。Further, in the step 4, in the cost function of the training network, when the i-th instance is a correct instance, the label value is 0; when the i-th instance is an error instance, the label value is 1.

本发明的有益效果是：本发明首先采用了两个卷积神经子网络对待匹配图像小块进行特征提取，通过卷积神经网络的自动学习能力，能够自动提取出健壮、多样的特征，避免了传统立体匹配方法的复杂的特征选择和人工抽取特征的过程。然后将它们的输出特征进行级联送入到全连接层进行匹配代价计算，获得了比传统立体匹配方法更好的匹配代价，结合一些视差的后处理方法，有效的获得了高精度的视差图，并具有较好的实时性。The beneficial effect of the present invention is: firstly, the present invention adopts two convolutional neural sub-networks to extract features of small blocks of images to be matched, through the automatic learning ability of convolutional neural networks, robust and diverse features can be automatically extracted, avoiding the The complex feature selection and manual feature extraction process of the traditional stereo matching method. Then their output features are cascaded and sent to the fully connected layer for matching cost calculation, and a better matching cost than the traditional stereo matching method is obtained. Combined with some disparity post-processing methods, a high-precision disparity map is effectively obtained. , and has good real-time performance.

附图说明Description of drawings

图1为构造训练实例的示意图；Fig. 1 is a schematic diagram of constructing a training example;

图2为用于计算待匹配点的匹配代价的卷积神经网络结构示意图；Fig. 2 is a schematic diagram of the convolutional neural network structure for calculating the matching cost of the points to be matched;

图3为二次曲线求极值点的示意图。Fig. 3 is a schematic diagram of finding the extremum point of the quadratic curve.

具体实施方式detailed description

以下结合附图和实施实例对本发明作进一步说明。The present invention will be further described below in conjunction with accompanying drawings and implementation examples.

本发明提供的一种基于卷积神经网络的双目立体匹配方法，包括以下步骤：A kind of binocular stereo matching method based on convolutional neural network provided by the invention comprises the following steps:

(1)图像预处理。对10组带有参考视差图的立体图像对的左图和右图分别做Z-score标准化处理：分别计算图像中所有像素值的均值x_average和标准差σ，比如对于图像X做预处理：(1) Image preprocessing. Perform Z-score standardization processing on the left and right images of 10 sets of stereo image pairs with reference disparity maps: respectively calculate the mean x _average and standard deviation σ of all pixel values in the image, such as preprocessing for image X:

${x x}_{a a v v e e r r a a g g e e} = = \frac{11}{380380 \times \times 430430} \underset{((i i,, j j)) &Element; &Element; 99 \times \times 99}{Σ Σ} {x x}_{((i i,, j j))} = = 165165$

$σ σ = = \sqrt{\frac{11}{380380 \times \times 430430} \underset{((i i,, j j)) &Element; &Element; 380380 \times \times 430430}{Σ Σ} {(({x x}_{((i i,, j j))} - - 165165))}^{22}} = = 1.23 1.23$

${x x}_{((i i,, j j))}^{' '} &LeftArrow; &LeftArrow; \frac{{x x}_{((i i,, j j))} - - 165165}{1.23 1.23}$

(2)构造训练实例。从预处理后的左图选取中心在p＝(x,y)、大小为9×9的一小块图像从预处理后的右图选取中心在q＝(x-d,y)、大小为9×9的一小块图像和构成一个训练实例，如图1所示：(2) Construct training examples. Select a small image with a center at p=(x,y) and a size of 9×9 from the preprocessed left image Select a small image with a center at q=(xd,y) and a size of 9×9 from the preprocessed right image and Constitute a training example, as shown in Figure 1:

对于每一个知道参考视差值d的位置，我们抽取一个正确的训练实例和一个错误的训练实例。为了获得一个正确的训练实例，将右小块图像的中心设置在：For each location where the reference disparity value d is known, we sample a correct training instance and a wrong training instance. In order to obtain a correct training instance, the right patch image The center is set at:

q＝(x-d+o_pos,y)q＝(x-d+o _pos ,y)

其中o_pos是在[-0.5,0.5]中随机取值。Among them, o _pos is a random value in [-0.5,0.5].

q＝(x-d+o_neg,y)q＝(x-d+o _neg ,y)

其中o_neg是在[-1.5,-18]或[1.5,18]中随机取值。Where o _neg is a random value in [-1.5,-18] or [1.5,18].

(3)构造用于计算匹配代价的卷积神经网络结构。首先构造两个完全一样的子网络，每个子网络由两个卷积层和一个全连接层，每层后面跟随一个ReLU层。卷积核的大小为3×3，，每一层有32个卷积核，全连接层有200个单元。然后将两个子网络的输出级联起来，得到一个长度为400的向量。接着连接两个全连接层，每层后面接一个ReLU层，每个全连接层有300个单元。最后连接只有一个单元的全连接层，跟随一个sigmoid转移函数。经过sigmoid处理的输出就是网络的输出结果。如图2所示：对于每一个输入网络的输出表示为：(3) Construct a convolutional neural network structure for calculating the matching cost. First construct two identical sub-networks, each sub-network consists of two convolutional layers and a fully connected layer, each followed by a ReLU layer. The size of the convolution kernel is 3×3, each layer has 32 convolution kernels, and the fully connected layer has 200 units. The outputs of the two sub-networks are then concatenated to obtain a vector of length 400. Then two fully connected layers are connected, each layer is followed by a ReLU layer, and each fully connected layer has 300 units. Finally connect a fully connected layer with only one unit, followed by a sigmoid transfer function. The output after sigmoid processing is the output of the network. As shown in Figure 2: For each input The output of the network is expressed as:

(4)训练网络。根据步骤(2)，每次构造64个正确的训练实例和64个错误的训练实例，其对应的输出Y_label＝[y_label(1),y_label(2),…,y_label(128),],其中第i个训练实例的标签应满足以下条件：(4) Train the network. According to step (2), 64 correct training examples and 64 wrong training examples are constructed each time, and the corresponding output Y _label =[y _label (1), y _label (2),...,y _label (128) ,], where the label of the i-th training instance should satisfy the following conditions:

将其用于步骤(3)构造的网络进行有监督的反向传播算法训练，用二值交叉熵损失函数计算损失代价为：Use it for the network constructed in step (3) for supervised backpropagation algorithm training, and use the binary cross-entropy loss function to calculate the loss cost as:

$- - \frac{11}{128128} {Σ Σ}_{i i = = 11}^{128128} ((l l o o g g (({y the y}_{i i})) {y the y}_{l l a a b b e e l l} ((i i)) + + l l o o g g ((11 - - {y the y}_{i i})) ((11 - - {y the y}_{l l a a b b e e l l} ((i i))))))$

其中y_i为第i个样例对应的输出值。where y _i is the output value corresponding to the i-th sample.

(5)求视差图。从测试集中取一对图像，进行步骤(1)的预处理。运用步骤(4)训练出来的网络，对于左图中每一个位置p＝(x,y)，计算其与右图在位置q＝(x-d,y)的匹配代价，其中d∈(0,30)(30表示可能的最大视差值)，得到：(5) Find the disparity map. Take a pair of images from the test set and perform the preprocessing of step (1). Using the network trained in step (4), for each position p=(x,y) in the left image, calculate its matching cost with the right image at position q=(x-d,y), where d∈(0,30 )(30 represents the maximum possible disparity value), get:

对于左图中的每一个位置p＝(x,y)，将上式中的匹配代价取最小时的位置d即为所求视差D(p)：For each position p=(x, y) in the left picture, the position d when the matching cost in the above formula is minimized is the required parallax D(p):

$D D. ((p p)) = = arg arg \underset{d d}{min min} C C ((p p,, d d))$

(6.1)亚像素视差。根据步骤(5)求出的匹配代价构造一条二次曲线，如图3所示，取极值点可以亚像素视差D_SE(p)：(6.1) Sub-pixel parallax. Construct a quadratic curve according to the matching cost obtained in step (5), as shown in Figure 3, the sub-pixel parallax D _SE (p) can be obtained by taking the extreme point:

其中d＝D(p)，C_-＝C_CNN(p,d-1)，C＝C_CNN(p,d)，C₊＝C_CNN(p,d+1)。Where d=D(p), C ₋ =C _CNN (p,d−1), C=C _CNN (p,d), C ₊ =C _CNN (p,d+1).

(6.2)对视差图D_SE(p)进行中值滤波和双线性滤波，得到最终的视差图D_final(p)。(6.2) Perform median filtering and bilinear filtering on the disparity map D _SE (p) to obtain the final disparity map D _final (p).

以上所述仅为本发明的优选实施方式，但本发明保护范围并不局限于此。任何本领域的技术人员在本发明公开的技术范围内，均可对其进行适当的改变或变化，而这种改变或变化都应涵盖在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto. Any person skilled in the art can make appropriate changes or changes within the technical scope disclosed in the present invention, and such changes or changes should be covered within the protection scope of the present invention.

Claims

1. a binocular stereo matching method based on convolutional neural network, is characterized in that, comprises the following steps:

(1) Image preprocessing. Perform Z-score normalization on the left and right images of the stereo image pair with the reference disparity map.

(2) Construct training examples. Select a small image with a center at p=(x,y) and a size of n×n from the preprocessed left image Select a small image with the center at q=(xd,y) and size n×n from the preprocessed right image and Make a training instance:

For each position in the left image where the reference disparity value d is known, a correct training instance and a wrong training instance are extracted.

In order to obtain a correct training instance, the right patch image The center is set at:

q＝(x-d+o _pos ,y)

Among them, o _pos is randomly selected in [-dataset_pos, dataset_pos], and dataset_pos is a positive integer.

In order to obtain a wrong training instance, the right patch image The center is set at:

q＝(x-d+o _neg ,y)

Where o _neg takes a random value in [-dataset_neg_low,-dataset_neg_high] or [dataset_neg_low, dataset_neg_high]. Both dataset_neg_low and dataset_neg_high are positive integers.

(3) Construct a convolutional neural network structure for calculating the matching cost. First construct two identical sub-networks, each sub-network consists of two convolutional layers and a fully connected layer, each followed by a ReLU layer. Then the output of the two sub-networks is cascaded, followed by two fully connected layers, each layer is followed by a ReLU layer, and the last fully connected layer is followed by a sigmoid transfer function. for each input The output of the network is expressed as

(4) Train the network. According to step (2), N/2 correct training examples and N/2 wrong training examples are constructed each time, and they are used in the network constructed in step (3) for supervised backpropagation algorithm training to obtain training Network, N is the number of training sets.

(5) Find the disparity map. Take a set of stereo image pairs from the test set and perform the preprocessing of step (1). Using the network trained in step (4), for each position p=(x,y) in the left picture, calculate its matching cost C _CNN (p,d) with the right picture at position q=(xd,y), Where d∈(0,DISP_MAX) (DISP_MAX represents the maximum possible disparity value), get:

For each position p=(x, y) in the left picture, the position d at which the matching cost in the above formula is minimized is the required parallax D(p):

D D. ((p p)) = = arg arg \underset{d d}{m m i i n no} {C C}_{C C N N N N} ((p p,, d d))

(6) Post-processing the disparity map. Specifically include the following sub-steps:

(6.1) Sub-pixel parallax. Construct a quadratic curve according to the matching cost obtained in step (5), and take the extreme points to obtain the sub-pixel disparity map D _SE (p):

{D D.}_{S S E E.} ((p p)) = = d d - - \frac{{C C}_{+ +} - - {C C}_{- -}}{22 (({C C}_{+ +} - - 22 C C + + {C C}_{- -}))},,

where d=D(p), C_=C _CNN (p,d-1), C=C _CNN (p,d), C ₊ =C _CNN (p,d+1);

(6.2) Perform median filtering and bilinear filtering on the sub-pixel disparity map D _SE (p) to obtain the final disparity map D _final (p).

2. a kind of binocular stereo matching method based on convolutional neural network according to claim 1, is characterized in that, in described step 1, described Z-score standardization process is specifically as follows:

Compute the mean x _average and standard deviation σ of all pixel values in image X:

{x x}_{a a v v e e r r a a g g e e} = = \frac{11}{W W \times \times H h} \underset{((i i,, j j)) &Element; &Element; W W \times \times H h}{Σ Σ} {x x}_{((i i,, j j))}

σ σ = = \sqrt{\frac{11}{W W \times \times H h} \underset{((i i,, j j)) &Element; &Element; W W \times \times H h}{Σ Σ} {(({x x}_{((i i,, j j))} - - {x x}_{a a v v e e r r a a g g e e}))}^{22}}

Among them, W×H is the size of the image X.

Normalize each pixel value to get a new image X', whose pixel value is:

{x x}_{((i i,, j j))}^{' '} &LeftArrow; &LeftArrow; \frac{{x x}_{((i i,, j j))} - - {x x}_{a a v v e e r r a a g g e e}}{σ σ} . .

3. a kind of binocular stereo matching method based on convolutional neural network according to claim 1, is characterized in that, in described step 4, the cost function of described training network uses binary cross entropy loss function:

- - \frac{11}{N N} {Σ Σ}_{i i = = 11}^{N N} ((log log (({p p}_{i i})) {y the y}_{i i} + + l l o o g g ((11 - - {p p}_{i i})) ((11 - - {y the y}_{i i}))))

Among them, N is the number of training sets, y _i is the label value of the i-th sample, and p _i is the predicted value of the i-th sample.

4. a kind of binocular stereo matching method based on convolutional neural network according to claim 1, is characterized in that, in described step 4, in the cost function of described training network, when the i-th example is correct example When , the label value is 0; when the i-th instance is an error instance, the label value is 1.