CN112950697B - Monocular unsupervised depth estimation method based on CBAM - Google Patents
Monocular unsupervised depth estimation method based on CBAM Download PDFInfo
- Publication number
- CN112950697B CN112950697B CN202110142746.6A CN202110142746A CN112950697B CN 112950697 B CN112950697 B CN 112950697B CN 202110142746 A CN202110142746 A CN 202110142746A CN 112950697 B CN112950697 B CN 112950697B
- Authority
- CN
- China
- Prior art keywords
- cbam
- depth estimation
- resblock
- loss
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000009499 grossing Methods 0.000 claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 20
- 230000008569 process Effects 0.000 claims abstract description 17
- 230000007246 mechanism Effects 0.000 claims description 10
- 238000011176 pooling Methods 0.000 claims description 7
- 238000012360 testing method Methods 0.000 claims description 7
- 230000008447 perception Effects 0.000 abstract description 4
- 238000005516 engineering process Methods 0.000 abstract 1
- 238000005259 measurement Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 5
- 230000007613 environmental effect Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013441 quality evaluation Methods 0.000 description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/593—Depth or shape recovery from multiple images from stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a monocular unsupervised depth estimation method based on CBAM. The depth estimation is one of key technologies for realizing the perception of the surrounding environment by the robot, and the depth estimation method based on the supervision learning processes the distance measurement value obtained by the sensor such as the laser radar and trains as a true value, but the process occupies a large amount of manpower and computing resources, so that the application of the depth estimation method in the cross-scene is very limited. In the invention, under the depth estimation framework based on unsupervised learning, a convolution block attention module is introduced, the photometric reconstruction, parallax smoothing and left-right parallax consistency training of a stereoscopic image pair are carried out, and the depth estimation with scale is carried out on a monocular image. By using the method provided by the invention, the depth details of objects in the surrounding environment can be kept, the overall depth estimation precision is improved, and meanwhile, the generalization capability under a cross-scene condition can be ensured.
Description
Technical Field
The invention belongs to the field of autonomous navigation and environmental awareness of an intelligent agent, and particularly relates to a monocular unsupervised depth estimation method based on CBAM.
Background
The intelligent agent needs to have perfect environment sensing function for realizing safe and reliable autonomous navigation, which includes depth estimation of the environment around the intelligent agent. The depth estimation based on the three-dimensional laser radar can obtain a more accurate depth estimation result, but the depth estimation is expensive and only sparse depth estimation can be obtained. The RGBD camera based depth estimation is simple to operate, but the depth estimation range is limited and is used in an outdoor environment. Depth estimation based on stereo cameras is not limited in indoor and outdoor use, but its occupancy of computational resources is large and the range of depth estimation is limited due to short baselines. Depth estimation based on monocular cameras can yield dense depth maps, but conventional monocular methods cannot yield true depth estimates due to lack of absolute scale.
With the development of artificial intelligence, the intelligent agent gradually applies a deep convolutional neural network to complete the task of environmental perception. Researchers were first using supervised learning to recover the absolute scale of monocular cameras, thereby completing monocular dense depth estimation. However, supervised learning requires a large number of data samples with GroundTruth for training, which greatly restricts its generalization ability. Currently, unsupervised monocular depth estimation is favored by researchers in its simple and effective training manner and increasingly improved accuracy, and various advanced network design ideas are applied to the model, such as attention mechanisms, multipath connection, space searching and the like. Therefore, the monocular non-supervision depth estimation method with the attention mechanism is researched to realize high-precision dense depth perception of the intelligent body on the surrounding environment, and the method has important scientific research value and practical significance.
Disclosure of Invention
In order to solve the problems, the invention discloses a monocular unsupervised depth estimation method based on CBAM (Convolutional Block Attention Module, attention module of convolution block), which introduces an attention mechanism into a depth estimation task, reserves depth details of objects and improves the overall accuracy of depth estimation, thereby providing a foundation for autonomous navigation and environmental perception of an intelligent agent.
In order to achieve the above purpose, the technical scheme of the invention is as follows:
a CBAM-based monocular unsupervised depth estimation method, comprising the steps of:
step 1), introducing CBAM and Resblock to form Resblock-CBAM;
step 2), designing a depth estimation network with an attention mechanism based on a Resblock-CBAM;
and 3) training a depth estimation network aiming at luminosity reconstruction, parallax smoothing and left-right parallax consistency of the stereo image pair, and completing depth estimation of the monocular image.
Further, the step 1) of introducing CBAM and Resblock to form Resblock-CBAM comprises the following specific steps:
a) Setting a channel attention sub-module and a space attention sub-module in the CBAM to be sequentially connected, then connecting the CBAM and a Resblock in parallel to form Conventional Resblock-CBAM, and finally outputting an equation as shown in a formula (1):
F c =F r +F r ” (1)
in the method, in the process of the invention,for the output feature of Resblock, +.>Attention to the output characteristics of the sub-module for CBAM space,/->An output characteristic of Conventional Resblock-CBAM;
b) Setting a channel attention sub-module and a space attention sub-module in the CBAM to be sequentially connected, then connecting the CBAM and a Resblock in series to form Modified Resblock-CBAM, and finally outputting an equation as shown in a formula (2):
F M =F r ” (2)
in the method, in the process of the invention,an output characteristic of Modified Resblock-CBAM;
c) The specific process of the channel attention sub-module and the space attention sub-module in the CBAM is shown in the formula (3):
in the middle ofPaying attention to the output characteristics of the sub-modules for the CBAM channel; />Attention map for one-dimensional channel, +.>For a two-dimensional spatial attention map, +.>Representing pixel-by-pixel multiplication;
the specific process of the channel attention sub-module is shown in the formula (4):
wherein sigma represents a sigmoid function, MLP is a multi-component perceptron, omega 0 、ω 1 Is the weight of the multi-layer perceptron,channel descriptors corresponding to the maximum pooling and the average pooling are obtained;
the specific process of the space attention sub-module is shown in the formula (5):
f in 7×7 Representing the convolution operation of a 7 x 7 filter,corresponding spatial descriptors are pooled for maximum and average.
Step 2), designing a depth estimation network with an attention mechanism based on a Resblock-CBAM, wherein the depth estimation network comprises the following specific steps of:
a) Four Resblock-CBAM, the first three Conventional Resblock-CBAM and the fourth Modified Resblock-CBAM, are used sequentially in the encoder of the depth estimation network;
b) Five jumper layers are used in the encoder of the depth estimation network, wherein the first jumper layer is connected with a first convolution layer in the encoder and a second upper convolution layer in the decoder, the second jumper layer is connected with a first pooling layer in the encoder and a third upper convolution layer in the decoder, the third jumper layer is connected with a first Conventional Resblock-CBAM and a fourth upper convolution layer in the decoder, the fourth jumper layer is connected with a second Conventional Resblock-CBAM and a fifth upper convolution layer in the decoder, the fifth jumper layer is connected with a third Conventional Resblock-CBAM and a sixth upper convolution layer in the decoder, and Modified Resblock-CBAM is directly connected with the decoder without jumper layers.
Step 3), training a depth estimation network aiming at luminosity reconstruction, parallax smoothing and left-right parallax consistency of a stereo image pair, and estimating the depth of a monocular image during testing, wherein the method comprises the following specific steps of:
a) The total training loss of the depth estimation network includes photometric reconstruction loss, parallax smoothing loss, and left-right parallax consistency loss, as shown in equation (6):
where L is the total training loss of the depth estimation network, L s For each scale training loss, alpha ap 、α ds And alpha ap Weight coefficients respectively representing loss of light reconstruction, parallax smoothing loss and loss of left-right parallax consistency,light reconstruction loss representing left and right images, < ->Parallax smoothing loss representing left and right images, < ->A parallax consistency loss representing left and right images;
b) The difference between the input source image and its corresponding reconstructed image is measured using the image photometric reconstruction loss, which is shown in equation (7):
in the middle ofLoss for left image photometric reconstruction, N is the pixel quantity of a single image, +.>For the left image +.>For reconstructed left image, SSIM is a structure similarity function, α 1 For the scale parameters of the SSIM function, the right image photometric reconstruction loss +.>And->In the same form;
c) The steep and discontinuous changes of the depth map at the image gradient are improved using the parallax smoothing loss, which is shown in the formula (8):
in the middle ofFor parallax smoothing loss of left image, +.>Parallax smoothing loss for right image as parallax map for left imageAnd->In the same form;
d) The estimation accuracy of the network to the depth map is improved by using the left-right parallax consistency loss, and the left-right parallax consistency loss of the left image is shown as a formula (9):
in the middle ofFor disparity consistency loss of left image, alpha 2 For the scaling parameters of the SSIM function, +.>Disparity consistency loss of right image for disparity map mapped to right image +.>And->In the same form.
The beneficial effects of the invention are as follows:
according to the monocular unsupervised depth estimation method based on the CBAM, the convolution block attention module CBAM is introduced into an unsupervised depth estimation framework, so that dense depth estimation of monocular images is realized. In the process of introducing CBAM to a depth estimation network, combining the CBAM and Resblock into Resblock-CBAM, and extracting features of input from two dimensions of space and channel; meanwhile, the multi-scale information is fused by adopting the jump connection. By using the method provided by the invention, the attention mechanism is integrated into the depth estimation network and unsupervised training such as luminosity reconstruction, parallax smoothing, left-right parallax consistency and the like based on the image pair is performed, so that the depth details of objects in the environment can be kept and the overall depth estimation precision can be improved.
Drawings
FIG. 1 is an unsupervised depth estimation framework diagram;
FIG. 2 is a schematic diagram of a combination of a residual block and a convolution block attention module;
FIG. 3 is a schematic diagram of a sub-module in the convolution block attention module;
FIG. 4 is a diagram of a depth estimation network architecture;
FIG. 5 is a depth estimation visual quality assessment graph;
FIG. 6 is a diagram of a depth estimation experiment platform;
FIG. 7 is a view of a real city scene depth estimation;
FIG. 8 is a table of depth estimation accuracy comparisons.
Detailed Description
The present invention is further illustrated in the following drawings and detailed description, which are to be regarded as illustrative in nature and not as restrictive.
According to the monocular unsupervised depth estimation method based on the CBAM, as shown in fig. 1, the CBAM and the Resblock are firstly introduced to form a Resblock-CBAM, then a depth estimation network with an attention mechanism is designed based on the Resblock-CBAM, finally the depth estimation network is trained for luminosity reconstruction, parallax smoothing and left-right parallax consistency of a stereoscopic image pair, and the depth estimation of the monocular image is completed; the method comprises the following specific steps:
step 1), introducing CBAM and Resblock to form Resblock-CBAM, comprising the following steps:
a) Setting a channel attention sub-module and a space attention sub-module in the CBAM to be sequentially connected, then connecting the CBAM and a Resblock in parallel to form Conventional Resblock-CBAM, and finally outputting an equation as shown in a formula (1):
in the method, in the process of the invention,for the output feature of Resblock, +.>Attention to the output characteristics of the sub-module for CBAM space,/->An output characteristic of Conventional Resblock-CBAM;
b) Setting a channel attention sub-module and a space attention sub-module in the CBAM to be sequentially connected, then connecting the CBAM and a Resblock in series to form Modified Resblock-CBAM, and finally outputting an equation as shown in a formula (2):
in the method, in the process of the invention,an output characteristic of Modified Resblock-CBAM;
c) The specific process of the channel attention sub-module and the space attention sub-module in the CBAM is shown in the formula (3):
in the middle ofPaying attention to the output characteristics of the sub-modules for the CBAM channel; />Attention map for one-dimensional channel, +.>Is two (two)Dimension space attention map->Representing pixel-by-pixel multiplication;
the specific process of the channel attention sub-module is shown in the formula (4):
wherein sigma represents a sigmoid function, MLP is a multi-component perceptron, omega 0 、ω 1 Is the weight of the multi-layer perceptron,channel descriptors corresponding to the maximum pooling and the average pooling are obtained;
the specific process of the space attention sub-module is shown in the formula (5):
f in 7×7 Representing the convolution operation of a 7 x 7 filter,corresponding spatial descriptors are pooled for maximum and average.
Step 2), designing a depth estimation network with an attention mechanism based on a Resblock-CBAM, wherein the depth estimation network comprises the following specific steps of:
a) Four Resblock-CBAM, the first three Conventional Resblock-CBAM and the fourth Modified Resblock-CBAM, are used sequentially in the encoder of the depth estimation network;
b) Five jumper layers are used in the encoder of the depth estimation network, wherein the first jumper layer is connected with a first convolution layer in the encoder and a second upper convolution layer in the decoder, the second jumper layer is connected with a first pooling layer in the encoder and a third upper convolution layer in the decoder, the third jumper layer is connected with a first Conventional Resblock-CBAM and a fourth upper convolution layer in the decoder, the fourth jumper layer is connected with a second Conventional Resblock-CBAM and a fifth upper convolution layer in the decoder, the fifth jumper layer is connected with a third Conventional Resblock-CBAM and a sixth upper convolution layer in the decoder, and Modified Resblock-CBAM is directly connected with the decoder without jumper layers.
Step 3), training a depth estimation network aiming at luminosity reconstruction, parallax smoothing and left-right parallax consistency of a stereo image pair, and estimating the depth of a monocular image during testing, wherein the method comprises the following specific steps of:
a) The total training loss of the depth estimation network includes photometric reconstruction loss, parallax smoothing loss, and left-right parallax consistency loss, as shown in equation (6):
where L is the total training loss of the depth estimation network, L s For each scale training loss, alpha ap 、α ds And alpha ap Weight coefficients respectively representing loss of light reconstruction, parallax smoothing loss and loss of left-right parallax consistency,light reconstruction loss representing left and right images, < ->Parallax smoothing loss representing left and right images, < ->A parallax consistency loss representing left and right images;
b) The difference between the input source image and its corresponding reconstructed image is measured using the image photometric reconstruction loss, which is shown in equation (7):
in the middle ofLoss for left image photometric reconstruction, N is the pixel quantity of a single image, +.>For the left image +.>For reconstructed left image, SSIM is a structure similarity function, α 1 For the scale parameters of the SSIM function, the right image photometric reconstruction loss +.>And->In the same form;
c) The steep and discontinuous changes of the depth map at the image gradient are improved using the parallax smoothing loss, which is shown in the formula (8):
in the middle ofFor parallax smoothing loss of left image, +.>Parallax smoothing loss for right image as parallax map for left imageAnd->In the same form;
d) The estimation accuracy of the network to the depth map is improved by using the left-right parallax consistency loss, and the left-right parallax consistency loss of the left image is shown as a formula (9):
in the middle ofFor disparity consistency loss of left image, alpha 2 For the scaling parameters of the SSIM function, +.>Disparity consistency loss of right image for disparity map mapped to right image +.>And->In the same form.
In this embodiment, the unsupervised depth estimation framework operates in the TensorFlow, and the training network of NVIDIA GeForce RTX 2080 Ti-type graphics cards with 11GB memory is selected for about 22 hours to complete convergence. The relevant weight parameters in the trained loss function are set as follows: alpha ap =1,α lr =1,α 1 =0.85,α 2 =0.15, since unsupervised training such as photometric reconstruction, parallax smoothing, and left-right parallax consistency based on image pairs is performed at four scales, α is ds The values at disp1, disp2, disp3 and disp4 are set to 1, 0.5, 0.25 and 0.125, respectively. In order to better measure the accuracy of depth estimation, five evaluation indexes are defined as follows:
Abs Rel:Sq Rel:/>RMSE:/>
RMSE Log:Threshold:%of d k />where T is the pixel count of the test image, d k ,/>The predicted depth and true depth of the kth pixel, respectively.
An Eigen split test set containing 697 pictures of 29 scenes is selected to test a network trained on the KITTI data set, and accuracy comparison and visual quality evaluation are carried out with other existing methods. FIG. 8 is a graph showing the accuracy of the proposed method versus other depth estimation methods on an Eigen split test set, where a1, a2, a3 each represent delta < 1.25, delta < 1.25 2 , δ<1.25 3 . As shown in FIG. 8, the proposed method performs optimally in comparison to existing several types of unsupervised depth estimation methods; compared with the main stream several types of supervised depth estimation methods, the method is inferior to the depth estimation method ACA based on the attention aggregation network, and is superior to the other types of supervised learning methods using GroundTruth training. Fig. 5 is a visual quality evaluation chart of the proposed method and two main stream non-supervision depth estimation methods, and it can be seen from the chart that the proposed method can better preserve depth details of objects in the environment after using a convolution block attention module, and the overall visual quality is better than that of the other two main stream non-supervision depth estimation methods.
In order to better demonstrate the cross-scene generalization capability of unsupervised depth estimation over supervised depth estimation, a network trained on a KITTI data set is subjected to a depth estimation experiment under Nanjing part of urban road scenes. Fig. 6 is an experimental platform for performing depth estimation in a real environment, and the depth estimation result of the proposed method is shown in fig. 7, and the trained network has satisfactory visual quality of depth estimation in an unknown scene and can retain depth details of a plurality of near objects.
The technical means disclosed by the scheme of the invention are not limited to the technical means disclosed by the embodiment, but also comprise improvements and modifications based on the technical characteristics, and the improvements and modifications are also considered to be the protection scope of the invention.
Claims (5)
1. A monocular unsupervised depth estimation method based on CBAM is characterized in that: the method comprises the following steps:
step 1), introducing CBAM and Resblock to form Resblock-CBAM; the method comprises the following specific steps:
a) Setting a channel attention sub-module and a space attention sub-module in the CBAM to be sequentially connected, then connecting the CBAM and a Resblock in parallel to form Conventional Resblock-CBAM, and finally outputting an equation as shown in a formula (1):
F c =F r +F r ” (1)
in the method, in the process of the invention,for the output feature of Resblock, +.>Attention to the output characteristics of the sub-module for CBAM space,/->An output characteristic of Conventional Resblock-CBAM;
b) Setting a channel attention sub-module and a space attention sub-module in the CBAM to be sequentially connected, then connecting the CBAM and a Resblock in series to form Modified Resblock-CBAM, and finally outputting an equation as shown in a formula (2):
F M =F r ” (2)
in the method, in the process of the invention,an output characteristic of Modified Resblock-CBAM;
c) The specific process of the channel attention sub-module and the space attention sub-module in the CBAM is shown in the formula (3):
in the middle ofPaying attention to the output characteristics of the sub-modules for the CBAM channel; />Attention map for one-dimensional channel, +.>For a two-dimensional spatial attention map, +.>Representing pixel-by-pixel multiplication;
step 2), designing a depth estimation network with an attention mechanism based on a Resblock-CBAM;
and 3) training a depth estimation network aiming at luminosity reconstruction, parallax smoothing and left-right parallax consistency of the stereo image pair, and completing depth estimation of the monocular image.
2. The CBAM-based monocular unsupervised depth estimation method according to claim 1, wherein the specific procedure of the channel attention sub-module is as shown in formula (4):
wherein sigma represents a sigmoid function, MLP is a multi-component perceptron, omega 0 、ω 1 Is the weight of the multi-layer perceptron,corresponding channel descriptors are pooled for maximum and average.
3. The CBAM-based monocular unsupervised depth estimation method according to claim 1, wherein the specific procedure of the channel attention sub-module is as shown in formula (4): the specific process of the space attention sub-module is shown in the formula (5):
f in 7×7 Representing the convolution operation of a 7 x 7 filter,corresponding spatial descriptors are pooled for maximum and average.
4. The CBAM-based monocular unsupervised depth estimation method according to claim 1, wherein the depth estimation network with attention mechanism is designed based on the Resblock-CBAM in step 2), comprising the following specific steps:
a) Four Resblock-CBAM, the first three Conventional Resblock-CBAM and the fourth Modified Resblock-CBAM, are used sequentially in the encoder of the depth estimation network;
b) Five jumper layers are used in the encoder of the depth estimation network, wherein the first jumper layer is connected with a first convolution layer in the encoder and a second upper convolution layer in the decoder, the second jumper layer is connected with a first pooling layer in the encoder and a third upper convolution layer in the decoder, the third jumper layer is connected with a first Conventional Resblock-CBAM and a fourth upper convolution layer in the decoder, the fourth jumper layer is connected with a second Conventional Resblock-CBAM and a fifth upper convolution layer in the decoder, the fifth jumper layer is connected with a third Conventional Resblock-CBAM and a sixth upper convolution layer in the decoder, and Modified Resblock-CBAM is directly connected with the decoder without jumper layers.
5. The CBAM-based monocular unsupervised depth estimation method according to claim 1, wherein the step 3) trains the depth estimation network for photometric reconstruction, parallax smoothing and left-right parallax consistency of the stereo image pair, and performs monocular image depth estimation at the time of testing, comprising the specific steps of:
a) The total training loss of the depth estimation network includes photometric reconstruction loss, parallax smoothing loss, and left-right parallax consistency loss, as shown in equation (6):
where L is the total training loss of the depth estimation network, L s For each scale training loss, alpha ap 、α ds And alpha ap Weight coefficients respectively representing luminosity reconstruction loss, parallax smoothing loss and left-right parallax consistency loss,photometric reconstruction loss representing left and right images, < ->Parallax smoothing loss representing left and right images, < ->A parallax consistency loss representing left and right images;
b) The difference between the input source image and its corresponding reconstructed image is measured using the image photometric reconstruction loss, which is shown in equation (7):
in the middle ofLoss for left image photometric reconstruction, N is the pixel quantity of a single image, +.>For the left image +.>For reconstructed left image, SSIM is a structure similarity function, α 1 For the scale parameters of the SSIM function, the right image photometric reconstruction loss +.>And->In the same form;
c) The steep change and discontinuity of the depth map at the image gradient are improved using the parallax smoothing loss, which is shown in formula (8):
in the middle ofFor parallax smoothing loss of left image, +.>Parallax smoothing loss of right image as parallax map of left image +.>Andin the same form;
d) The estimation accuracy of the network to the depth map is improved by using the left-right parallax consistency loss, and the left-right parallax consistency loss of the left image is shown as a formula (9):
in the middle ofFor disparity consistency loss of left image, alpha 2 For the scaling parameters of the SSIM function, +.>Disparity consistency loss of right image for disparity map mapped to right image +.>And->In the same form.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110142746.6A CN112950697B (en) | 2021-02-02 | 2021-02-02 | Monocular unsupervised depth estimation method based on CBAM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110142746.6A CN112950697B (en) | 2021-02-02 | 2021-02-02 | Monocular unsupervised depth estimation method based on CBAM |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112950697A CN112950697A (en) | 2021-06-11 |
CN112950697B true CN112950697B (en) | 2024-04-16 |
Family
ID=76241549
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110142746.6A Active CN112950697B (en) | 2021-02-02 | 2021-02-02 | Monocular unsupervised depth estimation method based on CBAM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112950697B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111739082A (en) * | 2020-06-15 | 2020-10-02 | 大连理工大学 | An Unsupervised Depth Estimation Method for Stereo Vision Based on Convolutional Neural Networks |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11138751B2 (en) * | 2019-07-06 | 2021-10-05 | Toyota Research Institute, Inc. | Systems and methods for semi-supervised training using reprojected distance loss |
-
2021
- 2021-02-02 CN CN202110142746.6A patent/CN112950697B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111739082A (en) * | 2020-06-15 | 2020-10-02 | 大连理工大学 | An Unsupervised Depth Estimation Method for Stereo Vision Based on Convolutional Neural Networks |
Also Published As
Publication number | Publication date |
---|---|
CN112950697A (en) | 2021-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109685842B (en) | Sparse depth densification method based on multi-scale network | |
CN110322499B (en) | Monocular image depth estimation method based on multilayer characteristics | |
CN111462329B (en) | Three-dimensional reconstruction method of unmanned aerial vehicle aerial image based on deep learning | |
US20210142095A1 (en) | Image disparity estimation | |
CN113936139B (en) | Scene aerial view reconstruction method and system combining visual depth information and semantic segmentation | |
CN111696148A (en) | End-to-end stereo matching method based on convolutional neural network | |
CN109902702A (en) | Method and device for target detection | |
CN111325782A (en) | Unsupervised monocular view depth estimation method based on multi-scale unification | |
CN110910327B (en) | Unsupervised deep completion method based on mask enhanced network model | |
CN109376589A (en) | Recognition method of ROV deformable target and small target based on convolution kernel screening SSD network | |
CN114396877B (en) | Intelligent three-dimensional displacement field and strain field measurement method for mechanical properties of materials | |
CN112116646B (en) | A light field image depth estimation method based on deep convolutional neural network | |
CN109522840A (en) | A kind of expressway vehicle density monitoring calculation system and method | |
CN109801323A (en) | Pyramid binocular depth with self-promotion ability estimates model | |
CN112270701B (en) | Parallax prediction method, system and storage medium based on packet distance network | |
CN111105451A (en) | A Binocular Depth Estimation Method for Driving Scenes Overcoming Occlusion Effect | |
CN116468769A (en) | An Image-Based Depth Information Estimation Method | |
CN116222577A (en) | Closed loop detection method, training method, system, electronic equipment and storage medium | |
CN116363529A (en) | A remote sensing image target detection method based on improved lightweight YOLOv4 | |
CN112950697B (en) | Monocular unsupervised depth estimation method based on CBAM | |
CN119181067A (en) | Multi-mode depth fusion 3D target detection method based on reliable depth estimation | |
CN119068114A (en) | A method for 3D reconstruction of building automation | |
CN115496788A (en) | Deep completion method using airspace propagation post-processing module | |
CN115222790A (en) | Single photon three-dimensional reconstruction method, system, equipment and storage medium | |
CN115170654A (en) | Vision-based method for estimating relative distance and direction of underwater bionic manta ray robotic fish |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |