[go: up one dir, main page]

CN112950697B - Monocular unsupervised depth estimation method based on CBAM - Google Patents

Monocular unsupervised depth estimation method based on CBAM Download PDF

Info

Publication number
CN112950697B
CN112950697B CN202110142746.6A CN202110142746A CN112950697B CN 112950697 B CN112950697 B CN 112950697B CN 202110142746 A CN202110142746 A CN 202110142746A CN 112950697 B CN112950697 B CN 112950697B
Authority
CN
China
Prior art keywords
cbam
depth estimation
resblock
loss
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110142746.6A
Other languages
Chinese (zh)
Other versions
CN112950697A (en
Inventor
潘树国
魏建胜
高旺
赵涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202110142746.6A priority Critical patent/CN112950697B/en
Publication of CN112950697A publication Critical patent/CN112950697A/en
Application granted granted Critical
Publication of CN112950697B publication Critical patent/CN112950697B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a monocular unsupervised depth estimation method based on CBAM. The depth estimation is one of key technologies for realizing the perception of the surrounding environment by the robot, and the depth estimation method based on the supervision learning processes the distance measurement value obtained by the sensor such as the laser radar and trains as a true value, but the process occupies a large amount of manpower and computing resources, so that the application of the depth estimation method in the cross-scene is very limited. In the invention, under the depth estimation framework based on unsupervised learning, a convolution block attention module is introduced, the photometric reconstruction, parallax smoothing and left-right parallax consistency training of a stereoscopic image pair are carried out, and the depth estimation with scale is carried out on a monocular image. By using the method provided by the invention, the depth details of objects in the surrounding environment can be kept, the overall depth estimation precision is improved, and meanwhile, the generalization capability under a cross-scene condition can be ensured.

Description

Monocular unsupervised depth estimation method based on CBAM
Technical Field
The invention belongs to the field of autonomous navigation and environmental awareness of an intelligent agent, and particularly relates to a monocular unsupervised depth estimation method based on CBAM.
Background
The intelligent agent needs to have perfect environment sensing function for realizing safe and reliable autonomous navigation, which includes depth estimation of the environment around the intelligent agent. The depth estimation based on the three-dimensional laser radar can obtain a more accurate depth estimation result, but the depth estimation is expensive and only sparse depth estimation can be obtained. The RGBD camera based depth estimation is simple to operate, but the depth estimation range is limited and is used in an outdoor environment. Depth estimation based on stereo cameras is not limited in indoor and outdoor use, but its occupancy of computational resources is large and the range of depth estimation is limited due to short baselines. Depth estimation based on monocular cameras can yield dense depth maps, but conventional monocular methods cannot yield true depth estimates due to lack of absolute scale.
With the development of artificial intelligence, the intelligent agent gradually applies a deep convolutional neural network to complete the task of environmental perception. Researchers were first using supervised learning to recover the absolute scale of monocular cameras, thereby completing monocular dense depth estimation. However, supervised learning requires a large number of data samples with GroundTruth for training, which greatly restricts its generalization ability. Currently, unsupervised monocular depth estimation is favored by researchers in its simple and effective training manner and increasingly improved accuracy, and various advanced network design ideas are applied to the model, such as attention mechanisms, multipath connection, space searching and the like. Therefore, the monocular non-supervision depth estimation method with the attention mechanism is researched to realize high-precision dense depth perception of the intelligent body on the surrounding environment, and the method has important scientific research value and practical significance.
Disclosure of Invention
In order to solve the problems, the invention discloses a monocular unsupervised depth estimation method based on CBAM (Convolutional Block Attention Module, attention module of convolution block), which introduces an attention mechanism into a depth estimation task, reserves depth details of objects and improves the overall accuracy of depth estimation, thereby providing a foundation for autonomous navigation and environmental perception of an intelligent agent.
In order to achieve the above purpose, the technical scheme of the invention is as follows:
a CBAM-based monocular unsupervised depth estimation method, comprising the steps of:
step 1), introducing CBAM and Resblock to form Resblock-CBAM;
step 2), designing a depth estimation network with an attention mechanism based on a Resblock-CBAM;
and 3) training a depth estimation network aiming at luminosity reconstruction, parallax smoothing and left-right parallax consistency of the stereo image pair, and completing depth estimation of the monocular image.
Further, the step 1) of introducing CBAM and Resblock to form Resblock-CBAM comprises the following specific steps:
a) Setting a channel attention sub-module and a space attention sub-module in the CBAM to be sequentially connected, then connecting the CBAM and a Resblock in parallel to form Conventional Resblock-CBAM, and finally outputting an equation as shown in a formula (1):
F c =F r +F r ” (1)
in the method, in the process of the invention,for the output feature of Resblock, +.>Attention to the output characteristics of the sub-module for CBAM space,/->An output characteristic of Conventional Resblock-CBAM;
b) Setting a channel attention sub-module and a space attention sub-module in the CBAM to be sequentially connected, then connecting the CBAM and a Resblock in series to form Modified Resblock-CBAM, and finally outputting an equation as shown in a formula (2):
F M =F r ” (2)
in the method, in the process of the invention,an output characteristic of Modified Resblock-CBAM;
c) The specific process of the channel attention sub-module and the space attention sub-module in the CBAM is shown in the formula (3):
in the middle ofPaying attention to the output characteristics of the sub-modules for the CBAM channel; />Attention map for one-dimensional channel, +.>For a two-dimensional spatial attention map, +.>Representing pixel-by-pixel multiplication;
the specific process of the channel attention sub-module is shown in the formula (4):
wherein sigma represents a sigmoid function, MLP is a multi-component perceptron, omega 0 、ω 1 Is the weight of the multi-layer perceptron,channel descriptors corresponding to the maximum pooling and the average pooling are obtained;
the specific process of the space attention sub-module is shown in the formula (5):
f in 7×7 Representing the convolution operation of a 7 x 7 filter,corresponding spatial descriptors are pooled for maximum and average.
Step 2), designing a depth estimation network with an attention mechanism based on a Resblock-CBAM, wherein the depth estimation network comprises the following specific steps of:
a) Four Resblock-CBAM, the first three Conventional Resblock-CBAM and the fourth Modified Resblock-CBAM, are used sequentially in the encoder of the depth estimation network;
b) Five jumper layers are used in the encoder of the depth estimation network, wherein the first jumper layer is connected with a first convolution layer in the encoder and a second upper convolution layer in the decoder, the second jumper layer is connected with a first pooling layer in the encoder and a third upper convolution layer in the decoder, the third jumper layer is connected with a first Conventional Resblock-CBAM and a fourth upper convolution layer in the decoder, the fourth jumper layer is connected with a second Conventional Resblock-CBAM and a fifth upper convolution layer in the decoder, the fifth jumper layer is connected with a third Conventional Resblock-CBAM and a sixth upper convolution layer in the decoder, and Modified Resblock-CBAM is directly connected with the decoder without jumper layers.
Step 3), training a depth estimation network aiming at luminosity reconstruction, parallax smoothing and left-right parallax consistency of a stereo image pair, and estimating the depth of a monocular image during testing, wherein the method comprises the following specific steps of:
a) The total training loss of the depth estimation network includes photometric reconstruction loss, parallax smoothing loss, and left-right parallax consistency loss, as shown in equation (6):
where L is the total training loss of the depth estimation network, L s For each scale training loss, alpha ap 、α ds And alpha ap Weight coefficients respectively representing loss of light reconstruction, parallax smoothing loss and loss of left-right parallax consistency,light reconstruction loss representing left and right images, < ->Parallax smoothing loss representing left and right images, < ->A parallax consistency loss representing left and right images;
b) The difference between the input source image and its corresponding reconstructed image is measured using the image photometric reconstruction loss, which is shown in equation (7):
in the middle ofLoss for left image photometric reconstruction, N is the pixel quantity of a single image, +.>For the left image +.>For reconstructed left image, SSIM is a structure similarity function, α 1 For the scale parameters of the SSIM function, the right image photometric reconstruction loss +.>And->In the same form;
c) The steep and discontinuous changes of the depth map at the image gradient are improved using the parallax smoothing loss, which is shown in the formula (8):
in the middle ofFor parallax smoothing loss of left image, +.>Parallax smoothing loss for right image as parallax map for left imageAnd->In the same form;
d) The estimation accuracy of the network to the depth map is improved by using the left-right parallax consistency loss, and the left-right parallax consistency loss of the left image is shown as a formula (9):
in the middle ofFor disparity consistency loss of left image, alpha 2 For the scaling parameters of the SSIM function, +.>Disparity consistency loss of right image for disparity map mapped to right image +.>And->In the same form.
The beneficial effects of the invention are as follows:
according to the monocular unsupervised depth estimation method based on the CBAM, the convolution block attention module CBAM is introduced into an unsupervised depth estimation framework, so that dense depth estimation of monocular images is realized. In the process of introducing CBAM to a depth estimation network, combining the CBAM and Resblock into Resblock-CBAM, and extracting features of input from two dimensions of space and channel; meanwhile, the multi-scale information is fused by adopting the jump connection. By using the method provided by the invention, the attention mechanism is integrated into the depth estimation network and unsupervised training such as luminosity reconstruction, parallax smoothing, left-right parallax consistency and the like based on the image pair is performed, so that the depth details of objects in the environment can be kept and the overall depth estimation precision can be improved.
Drawings
FIG. 1 is an unsupervised depth estimation framework diagram;
FIG. 2 is a schematic diagram of a combination of a residual block and a convolution block attention module;
FIG. 3 is a schematic diagram of a sub-module in the convolution block attention module;
FIG. 4 is a diagram of a depth estimation network architecture;
FIG. 5 is a depth estimation visual quality assessment graph;
FIG. 6 is a diagram of a depth estimation experiment platform;
FIG. 7 is a view of a real city scene depth estimation;
FIG. 8 is a table of depth estimation accuracy comparisons.
Detailed Description
The present invention is further illustrated in the following drawings and detailed description, which are to be regarded as illustrative in nature and not as restrictive.
According to the monocular unsupervised depth estimation method based on the CBAM, as shown in fig. 1, the CBAM and the Resblock are firstly introduced to form a Resblock-CBAM, then a depth estimation network with an attention mechanism is designed based on the Resblock-CBAM, finally the depth estimation network is trained for luminosity reconstruction, parallax smoothing and left-right parallax consistency of a stereoscopic image pair, and the depth estimation of the monocular image is completed; the method comprises the following specific steps:
step 1), introducing CBAM and Resblock to form Resblock-CBAM, comprising the following steps:
a) Setting a channel attention sub-module and a space attention sub-module in the CBAM to be sequentially connected, then connecting the CBAM and a Resblock in parallel to form Conventional Resblock-CBAM, and finally outputting an equation as shown in a formula (1):
in the method, in the process of the invention,for the output feature of Resblock, +.>Attention to the output characteristics of the sub-module for CBAM space,/->An output characteristic of Conventional Resblock-CBAM;
b) Setting a channel attention sub-module and a space attention sub-module in the CBAM to be sequentially connected, then connecting the CBAM and a Resblock in series to form Modified Resblock-CBAM, and finally outputting an equation as shown in a formula (2):
in the method, in the process of the invention,an output characteristic of Modified Resblock-CBAM;
c) The specific process of the channel attention sub-module and the space attention sub-module in the CBAM is shown in the formula (3):
in the middle ofPaying attention to the output characteristics of the sub-modules for the CBAM channel; />Attention map for one-dimensional channel, +.>Is two (two)Dimension space attention map->Representing pixel-by-pixel multiplication;
the specific process of the channel attention sub-module is shown in the formula (4):
wherein sigma represents a sigmoid function, MLP is a multi-component perceptron, omega 0 、ω 1 Is the weight of the multi-layer perceptron,channel descriptors corresponding to the maximum pooling and the average pooling are obtained;
the specific process of the space attention sub-module is shown in the formula (5):
f in 7×7 Representing the convolution operation of a 7 x 7 filter,corresponding spatial descriptors are pooled for maximum and average.
Step 2), designing a depth estimation network with an attention mechanism based on a Resblock-CBAM, wherein the depth estimation network comprises the following specific steps of:
a) Four Resblock-CBAM, the first three Conventional Resblock-CBAM and the fourth Modified Resblock-CBAM, are used sequentially in the encoder of the depth estimation network;
b) Five jumper layers are used in the encoder of the depth estimation network, wherein the first jumper layer is connected with a first convolution layer in the encoder and a second upper convolution layer in the decoder, the second jumper layer is connected with a first pooling layer in the encoder and a third upper convolution layer in the decoder, the third jumper layer is connected with a first Conventional Resblock-CBAM and a fourth upper convolution layer in the decoder, the fourth jumper layer is connected with a second Conventional Resblock-CBAM and a fifth upper convolution layer in the decoder, the fifth jumper layer is connected with a third Conventional Resblock-CBAM and a sixth upper convolution layer in the decoder, and Modified Resblock-CBAM is directly connected with the decoder without jumper layers.
Step 3), training a depth estimation network aiming at luminosity reconstruction, parallax smoothing and left-right parallax consistency of a stereo image pair, and estimating the depth of a monocular image during testing, wherein the method comprises the following specific steps of:
a) The total training loss of the depth estimation network includes photometric reconstruction loss, parallax smoothing loss, and left-right parallax consistency loss, as shown in equation (6):
where L is the total training loss of the depth estimation network, L s For each scale training loss, alpha ap 、α ds And alpha ap Weight coefficients respectively representing loss of light reconstruction, parallax smoothing loss and loss of left-right parallax consistency,light reconstruction loss representing left and right images, < ->Parallax smoothing loss representing left and right images, < ->A parallax consistency loss representing left and right images;
b) The difference between the input source image and its corresponding reconstructed image is measured using the image photometric reconstruction loss, which is shown in equation (7):
in the middle ofLoss for left image photometric reconstruction, N is the pixel quantity of a single image, +.>For the left image +.>For reconstructed left image, SSIM is a structure similarity function, α 1 For the scale parameters of the SSIM function, the right image photometric reconstruction loss +.>And->In the same form;
c) The steep and discontinuous changes of the depth map at the image gradient are improved using the parallax smoothing loss, which is shown in the formula (8):
in the middle ofFor parallax smoothing loss of left image, +.>Parallax smoothing loss for right image as parallax map for left imageAnd->In the same form;
d) The estimation accuracy of the network to the depth map is improved by using the left-right parallax consistency loss, and the left-right parallax consistency loss of the left image is shown as a formula (9):
in the middle ofFor disparity consistency loss of left image, alpha 2 For the scaling parameters of the SSIM function, +.>Disparity consistency loss of right image for disparity map mapped to right image +.>And->In the same form.
In this embodiment, the unsupervised depth estimation framework operates in the TensorFlow, and the training network of NVIDIA GeForce RTX 2080 Ti-type graphics cards with 11GB memory is selected for about 22 hours to complete convergence. The relevant weight parameters in the trained loss function are set as follows: alpha ap =1,α lr =1,α 1 =0.85,α 2 =0.15, since unsupervised training such as photometric reconstruction, parallax smoothing, and left-right parallax consistency based on image pairs is performed at four scales, α is ds The values at disp1, disp2, disp3 and disp4 are set to 1, 0.5, 0.25 and 0.125, respectively. In order to better measure the accuracy of depth estimation, five evaluation indexes are defined as follows:
Abs Rel:Sq Rel:/>RMSE:/>
RMSE Log:Threshold:%of d k />where T is the pixel count of the test image, d k ,/>The predicted depth and true depth of the kth pixel, respectively.
An Eigen split test set containing 697 pictures of 29 scenes is selected to test a network trained on the KITTI data set, and accuracy comparison and visual quality evaluation are carried out with other existing methods. FIG. 8 is a graph showing the accuracy of the proposed method versus other depth estimation methods on an Eigen split test set, where a1, a2, a3 each represent delta < 1.25, delta < 1.25 2 , δ<1.25 3 . As shown in FIG. 8, the proposed method performs optimally in comparison to existing several types of unsupervised depth estimation methods; compared with the main stream several types of supervised depth estimation methods, the method is inferior to the depth estimation method ACA based on the attention aggregation network, and is superior to the other types of supervised learning methods using GroundTruth training. Fig. 5 is a visual quality evaluation chart of the proposed method and two main stream non-supervision depth estimation methods, and it can be seen from the chart that the proposed method can better preserve depth details of objects in the environment after using a convolution block attention module, and the overall visual quality is better than that of the other two main stream non-supervision depth estimation methods.
In order to better demonstrate the cross-scene generalization capability of unsupervised depth estimation over supervised depth estimation, a network trained on a KITTI data set is subjected to a depth estimation experiment under Nanjing part of urban road scenes. Fig. 6 is an experimental platform for performing depth estimation in a real environment, and the depth estimation result of the proposed method is shown in fig. 7, and the trained network has satisfactory visual quality of depth estimation in an unknown scene and can retain depth details of a plurality of near objects.
The technical means disclosed by the scheme of the invention are not limited to the technical means disclosed by the embodiment, but also comprise improvements and modifications based on the technical characteristics, and the improvements and modifications are also considered to be the protection scope of the invention.

Claims (5)

1. A monocular unsupervised depth estimation method based on CBAM is characterized in that: the method comprises the following steps:
step 1), introducing CBAM and Resblock to form Resblock-CBAM; the method comprises the following specific steps:
a) Setting a channel attention sub-module and a space attention sub-module in the CBAM to be sequentially connected, then connecting the CBAM and a Resblock in parallel to form Conventional Resblock-CBAM, and finally outputting an equation as shown in a formula (1):
F c =F r +F r ” (1)
in the method, in the process of the invention,for the output feature of Resblock, +.>Attention to the output characteristics of the sub-module for CBAM space,/->An output characteristic of Conventional Resblock-CBAM;
b) Setting a channel attention sub-module and a space attention sub-module in the CBAM to be sequentially connected, then connecting the CBAM and a Resblock in series to form Modified Resblock-CBAM, and finally outputting an equation as shown in a formula (2):
F M =F r ” (2)
in the method, in the process of the invention,an output characteristic of Modified Resblock-CBAM;
c) The specific process of the channel attention sub-module and the space attention sub-module in the CBAM is shown in the formula (3):
in the middle ofPaying attention to the output characteristics of the sub-modules for the CBAM channel; />Attention map for one-dimensional channel, +.>For a two-dimensional spatial attention map, +.>Representing pixel-by-pixel multiplication;
step 2), designing a depth estimation network with an attention mechanism based on a Resblock-CBAM;
and 3) training a depth estimation network aiming at luminosity reconstruction, parallax smoothing and left-right parallax consistency of the stereo image pair, and completing depth estimation of the monocular image.
2. The CBAM-based monocular unsupervised depth estimation method according to claim 1, wherein the specific procedure of the channel attention sub-module is as shown in formula (4):
wherein sigma represents a sigmoid function, MLP is a multi-component perceptron, omega 0 、ω 1 Is the weight of the multi-layer perceptron,corresponding channel descriptors are pooled for maximum and average.
3. The CBAM-based monocular unsupervised depth estimation method according to claim 1, wherein the specific procedure of the channel attention sub-module is as shown in formula (4): the specific process of the space attention sub-module is shown in the formula (5):
f in 7×7 Representing the convolution operation of a 7 x 7 filter,corresponding spatial descriptors are pooled for maximum and average.
4. The CBAM-based monocular unsupervised depth estimation method according to claim 1, wherein the depth estimation network with attention mechanism is designed based on the Resblock-CBAM in step 2), comprising the following specific steps:
a) Four Resblock-CBAM, the first three Conventional Resblock-CBAM and the fourth Modified Resblock-CBAM, are used sequentially in the encoder of the depth estimation network;
b) Five jumper layers are used in the encoder of the depth estimation network, wherein the first jumper layer is connected with a first convolution layer in the encoder and a second upper convolution layer in the decoder, the second jumper layer is connected with a first pooling layer in the encoder and a third upper convolution layer in the decoder, the third jumper layer is connected with a first Conventional Resblock-CBAM and a fourth upper convolution layer in the decoder, the fourth jumper layer is connected with a second Conventional Resblock-CBAM and a fifth upper convolution layer in the decoder, the fifth jumper layer is connected with a third Conventional Resblock-CBAM and a sixth upper convolution layer in the decoder, and Modified Resblock-CBAM is directly connected with the decoder without jumper layers.
5. The CBAM-based monocular unsupervised depth estimation method according to claim 1, wherein the step 3) trains the depth estimation network for photometric reconstruction, parallax smoothing and left-right parallax consistency of the stereo image pair, and performs monocular image depth estimation at the time of testing, comprising the specific steps of:
a) The total training loss of the depth estimation network includes photometric reconstruction loss, parallax smoothing loss, and left-right parallax consistency loss, as shown in equation (6):
where L is the total training loss of the depth estimation network, L s For each scale training loss, alpha ap 、α ds And alpha ap Weight coefficients respectively representing luminosity reconstruction loss, parallax smoothing loss and left-right parallax consistency loss,photometric reconstruction loss representing left and right images, < ->Parallax smoothing loss representing left and right images, < ->A parallax consistency loss representing left and right images;
b) The difference between the input source image and its corresponding reconstructed image is measured using the image photometric reconstruction loss, which is shown in equation (7):
in the middle ofLoss for left image photometric reconstruction, N is the pixel quantity of a single image, +.>For the left image +.>For reconstructed left image, SSIM is a structure similarity function, α 1 For the scale parameters of the SSIM function, the right image photometric reconstruction loss +.>And->In the same form;
c) The steep change and discontinuity of the depth map at the image gradient are improved using the parallax smoothing loss, which is shown in formula (8):
in the middle ofFor parallax smoothing loss of left image, +.>Parallax smoothing loss of right image as parallax map of left image +.>Andin the same form;
d) The estimation accuracy of the network to the depth map is improved by using the left-right parallax consistency loss, and the left-right parallax consistency loss of the left image is shown as a formula (9):
in the middle ofFor disparity consistency loss of left image, alpha 2 For the scaling parameters of the SSIM function, +.>Disparity consistency loss of right image for disparity map mapped to right image +.>And->In the same form.
CN202110142746.6A 2021-02-02 2021-02-02 Monocular unsupervised depth estimation method based on CBAM Active CN112950697B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110142746.6A CN112950697B (en) 2021-02-02 2021-02-02 Monocular unsupervised depth estimation method based on CBAM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110142746.6A CN112950697B (en) 2021-02-02 2021-02-02 Monocular unsupervised depth estimation method based on CBAM

Publications (2)

Publication Number Publication Date
CN112950697A CN112950697A (en) 2021-06-11
CN112950697B true CN112950697B (en) 2024-04-16

Family

ID=76241549

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110142746.6A Active CN112950697B (en) 2021-02-02 2021-02-02 Monocular unsupervised depth estimation method based on CBAM

Country Status (1)

Country Link
CN (1) CN112950697B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111739082A (en) * 2020-06-15 2020-10-02 大连理工大学 An Unsupervised Depth Estimation Method for Stereo Vision Based on Convolutional Neural Networks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11138751B2 (en) * 2019-07-06 2021-10-05 Toyota Research Institute, Inc. Systems and methods for semi-supervised training using reprojected distance loss

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111739082A (en) * 2020-06-15 2020-10-02 大连理工大学 An Unsupervised Depth Estimation Method for Stereo Vision Based on Convolutional Neural Networks

Also Published As

Publication number Publication date
CN112950697A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
CN109685842B (en) Sparse depth densification method based on multi-scale network
CN110322499B (en) Monocular image depth estimation method based on multilayer characteristics
CN111462329B (en) Three-dimensional reconstruction method of unmanned aerial vehicle aerial image based on deep learning
US20210142095A1 (en) Image disparity estimation
CN113936139B (en) Scene aerial view reconstruction method and system combining visual depth information and semantic segmentation
CN111696148A (en) End-to-end stereo matching method based on convolutional neural network
CN109902702A (en) Method and device for target detection
CN111325782A (en) Unsupervised monocular view depth estimation method based on multi-scale unification
CN110910327B (en) Unsupervised deep completion method based on mask enhanced network model
CN109376589A (en) Recognition method of ROV deformable target and small target based on convolution kernel screening SSD network
CN114396877B (en) Intelligent three-dimensional displacement field and strain field measurement method for mechanical properties of materials
CN112116646B (en) A light field image depth estimation method based on deep convolutional neural network
CN109522840A (en) A kind of expressway vehicle density monitoring calculation system and method
CN109801323A (en) Pyramid binocular depth with self-promotion ability estimates model
CN112270701B (en) Parallax prediction method, system and storage medium based on packet distance network
CN111105451A (en) A Binocular Depth Estimation Method for Driving Scenes Overcoming Occlusion Effect
CN116468769A (en) An Image-Based Depth Information Estimation Method
CN116222577A (en) Closed loop detection method, training method, system, electronic equipment and storage medium
CN116363529A (en) A remote sensing image target detection method based on improved lightweight YOLOv4
CN112950697B (en) Monocular unsupervised depth estimation method based on CBAM
CN119181067A (en) Multi-mode depth fusion 3D target detection method based on reliable depth estimation
CN119068114A (en) A method for 3D reconstruction of building automation
CN115496788A (en) Deep completion method using airspace propagation post-processing module
CN115222790A (en) Single photon three-dimensional reconstruction method, system, equipment and storage medium
CN115170654A (en) Vision-based method for estimating relative distance and direction of underwater bionic manta ray robotic fish

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant