0% found this document useful (0 votes)

105 views15 pages

Depth Estimation with CNN Transfer

The document discusses depth estimation using a convolutional neural network with transfer learning. It proposes a CNN architecture with transfer learning that uses pretrained ResNet-50 and VGG19 models to extract features. This CNN can estimate depth from single images faster than a pure CNN. The influence of different loss functions and quantitative/qualitative results are also examined.

Uploaded by

Umang Soni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views15 pages

Depth Estimation with CNN Transfer

Uploaded by

Umang Soni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Depth estimation using Convolutional Neural Network with transfer

learning

Abstract

Predicting depth is crucial to understand the 3-D geometry of a scene. For stereo images local correspondence
suffices for estimation, whereas for finding depth from a single image is less straightforward, requiring
integration of both global and local information. In this paper, authors deal with depth estimation using
Convolutional Neural Network (CNN) with transfer learning. Authors propose a not so deep CNN which uses
transfer learning to extract low-level features. Fully convolutional architecture has been employed, which first
extracts low-level image features by pre-trained ResNet-50 and VGG19 network. Transfer learning has been
done by using the initial layers of ResNet-50 and VGG19 which are connected in parallel without down
sampling anywhere in the proposed architecture to recover the size of the depth map. This CNN network can run
real-time on images with enough computing power because it is not such a deep architecture. The model has
been compared with a pure CNN network and illustrates the effectiveness of transfer learning. It has been
demonstrated that our method of deep CNN with transfer learning yields a better result than one pure CNN and
is faster to converge. The influence of different loss functions during training has also been shown. The results
are shown by comparing qualitative visualization and quantitative metrics.

Keywords
Transfer learning, CNN, ResNet-50, VGG19.

1. Introduction
To achieve true humanoid, 3d information of the environment is one of the critical features a
robot must have. Historically Depth information is predominantly used in localization tasks
relative to the environment [8] be it the robot or a self-driving car using various techniques
like VO (visual odometry) or SLAM (simultaneous localization and mapping), 3D modelling
[1][2], etc. Estimating depth is an essential part of understanding geometric relative relations
within an environment. In turn, these relations provide a richer representation of the object
and its surrounding often lead to improvements in existing tasks be it recognition task [6],
physics, robotics [7][5], i.e., helps in determining the relative pose of the camera, and can be
used in reasoning about the occlusions.

Estimating depth from a monocular image is a known ill-posed problem, because an RGB
image may correspond to an infinite number of real-world scenarios. Previously, several
classic algorithms had been tried to tackle this problem, including the Structure from Motion
method. It leverages camera motion to estimate camera poses through different temporal
intervals and, in turn, estimate depth via triangulation from pairs of progressive views.
As the general trends of drones becoming smaller and smaller the monocular approach to
estimate depth is becoming interesting because the stereo case degenerates to the monocular
case when the distance between the baseline of a stereo camera is minimal compared to the
object or landmarks distance from the camera.
Recently, CNNs have been employed to learn depth from an RGB image. Motivated fom
recent developments in AI, we implemented an end-to-end trainable CNN architecture (not so
deep) combined with transfer learning to learn a mapping between color image pixel intensity
with the corresponding depth map.
In relation to depth estimation, the absolute scale of the surrounding environment symbolizes
the actual size of the objects in the physical world. Depth map correctly predicts the absolute
scale of the scene represents depth values where absolute values are similar to the actual
depth values. Conversely, a depth map which reflects the relative depth of the scene has the
same relation between the pixels (higher/lower depth) as the true depth map. In this work, we
tried to predict the relative scale of the scene, since it is more intuitive to predict using a
single image and therefore easier to train.

2. Literature Survey
2.1 In the work done by Eigen et al. [3], they were the one to first suggest a solution to
estimate the depth that uses CNNs. They used two-scale CNN consisting of the coarse-scale
network and the fine-scale network. The coarse scale network was a CNN that identifies the
geometry of a scene in Global Context. The output of the coarse-scale network is a low-
resolution image. This in turn, along with the original input image is then fed to the fine-scale
network. The fine-scale network is a CNN consisting of three CNN layers and was used to
refine the coarse prediction. Additionally, Eigen et al. [3] in their work considered the issue
of scale-invariance. In the paper, they have used scale-invariant error function for
performance evaluation and a scale-invariant loss function 𝐿 (𝑦, 𝑦*) for the training.
❑ ❑
1 1
𝐿 (𝑦, 𝑦*) =
n
∑ ❑(log 𝑦i − log 𝑦i* + n
∑ ❑(log 𝑦 * − log 𝑦 )) .
j j
2
(2.1)
i j

Where, 𝑦i signifies the projected depth value for the pixel 𝑖, 𝑦i* is the true depth value for the
Pixel. Scale invariance is achieved through the inner sum, which consists of the mean log
difference between predicted and ground truth depth values.
2.2 Xiaobai Ma, Zhenglin Geng, Zhi Bie [4] compared the three different CNN
architectures using three metrics of evaluation and three different loss functions. The 1 st
architecture is prone to overfitting as the no of parameters in the FC layer is 73293824 which
is astronomical compared to CNN layers which are only 27232. The 2nd architecture uses only
CNN layers to overcome the over fitting problem, but it is not enough deep to generalize the
depth and can give results equal or slightly better than the 1 st architecture if trained for
sufficient time. The 3rd architecture employs transfer learning and also gives the best results
when compared to other.

3. Network Architecture
We experimented with two CNN architectures model 1, i.e., pure/basic CNN and model 2
CNN with transfer learning, the best out of two is chosen after qualitative and quantitative
analysis. Each one is illustrated in the next subsections, and a comparison of their
performance in the experiment section.

Model 1 is a pure CNN model without transfer learning and can be seen in figure 1. Model 2
is a rather deep CNN model with transfer learning as shown in figure 2. For transfer learning
architecture, i.e., Model 2 we choose two pre-trained architectures to use which are ResNet50
and VGG19. We only use the pre-trained up to the point where the dimension of the image is
higher or equals to the output of the depth map and thus ten layers of VGG19, and 12 layers
of ResNet50 are used for the proposed transfer learning model (Model 2). After making the
output dimensions of both pre-trained models same we concatenate them and then add more
convolution and max polling layers to learn high-level features. The output of both models is
a flatten 4070 1D array for training purposes. It can be noted that we lose information if we
reduce the size of the image and then again increase it that is to decompress a picture after
compressing it. Therefore, we choose the pre-trained models up to the layers where the
resolution of the image is greater than or equal to 55x74, to avoid losing information by first
compressing and then decompressing the image.
Also, a large model like ResNet50 and VGG19 requires large memory to fit on a computer.
So, the two reason for not choosing the full ResNet50 and VGG19 architectures is to avoid
loss of information and memory management. Also, training time is one of the crucial
factors.
3.1 Pure/Basic CNN
As fully connected networks have an overfitting issue which is shown by Xiaobai Ma,
Zhenglin Geng, Zhi Bie [4] in their paper in the 1 st architecture, so we started from pure
CNN. The pure CNN architecture they used does not consider the global and local relation of
the scene. The architecture that we proposed of pure/basic CNN architecture (model 1) can be
seen in figure 1.Our pure/basic CNN architecture considers both global and local scene
information. The network consists of 8 convolution layers before the output layer out of
which three considers the global relation of the scene, and we expect this network to have a
better behaviour compared to second architecture proposed in section 2.2. Each convolution
layer is followed by batch normalization to facilitate the training process and then followed
by ReLU activation layer.

3.2 CNN with transfer learning

Training a neural network architecture from the level zero is a tiresome process and require a
lot of data and processing power and it becomes more cumbersome in case of training it on
images as the no of features is a lot compared to other simple tasks. Therefore, we employed
transfer learning in our second method, i.e., model 2. The architecture of CNN with transfer
learning (model 2) is shown in figure 2. The network consists of initially two parallel pre-
trained architectures, i.e., ResNet50 and VGG19 after which they are concatenated after
making their dimensions equal in the next step. After this point, the network has six more
CNN layers before the output layer which is flattened 1D array.
Figure 1. Block diagram of Model 1
Figure 2. Block diagram of Model 2
4. Dataset Used
For this research, NYU Depth v2 [8] dataset’s raw Kinect data and raw RGB output from the
camera component is used. The data has images in 640x480 resolution but for training data
augmentation has been done and changed the RGB image resolution to 304x228 and the raw
Kinect data resolution to 74x55 as our training set. Since it is easy to train the model for less
number of features. We used python for data augmentation. We trained our model on 32
images because of training time it takes as the training data increases and the memory
restrictions we can fit a certain amount of data at a time.

KITTI is another famous dataset used by researchers in similar projects. Since our
computation resources are limited, the feasibility of the study is based only on the training
result on the NYU dataset.

5. Loss functions for training

We used two loss functions to compare the results. They are the root mean square error
(RMSE) and root mean squared logarithmic error (RMSLE) for comparison and training
purpose of the proposed models.

Depth value 𝑦 is in logarithmic space (or log space) equals to (𝑦𝑙𝑖𝑛), where 𝑦𝑙𝑖𝑛 is the true
depth in linear space. For depth values in log space, The Euclidean loss does not minimize
the difference between estimated depth 𝑦 and the ground truth 𝑦*, but minimizes the

y
difference between and 𝑙𝑜𝑔𝑦*, which can be rewritten as 𝑙𝑜 ¿ . This means that loss in
❑
the log space optimizes ratio between 𝑦 and 𝑦* and achieves minimum when the ratio is 1.
Later experiments show that logarithmic approach achieves better prediction of relative depth
structure.

6. Training and testing

We compared both the architecture as mentioned earlier fixing same no of input parameters
and cost function (RMSE) and uses the one which gives the best results and also the one
which generalizes the results faster and better than the other.

After choosing the learning architecture, we trained the model for two different loss functions
and compared the results. The results can be seen in the subsequent sections. We choose the
best of them. The two loss functions that we have chosen are stated in section 5.
We trained our network on Amazon cloud GPU (Tesla k80) as the training time will be high
as the model we propose is a deep neural network architecture, and the input features are not
just in hundreds or thousands but in hundred thousand. The output is also not just simple
classification task or just a single output regression task, but in our case, the output has the
dimension in terms of thousands and not just one or two, as usual, is the case in other
regression problems in deep learning.

7. Results and discussion

In this section, we provide qualitative and quantitative analysis of our evaluations. We also
show the performance of different loss functions for the task. All the experiments are
implemented using the TensorFlow framework.

7.1 Results for choosing the best CNN architecture out of two

Best of two proposed architectures proposed in section 3 is chosen by training on 32 training

samples with RMSE as loss function and with 250 iterations. The RGB image is shown in
figure 3, and True depth map can be seen in figure 4, and the predicted results can be seen in
Figure 5 and 6.

Model 1 corresponds to the CNN model without transfer learning. Model 2 corresponds to
the CNN model with transfer learning.

Figure 3. True RGB image

Figure 4. True depth map

Figure 5. Predicted depth map of Figure 6. Predicted depth map of

model 1 model 2

From the figure 5 and 6, it is clear that the CNN model with transfer learning architecture,
i.e., Model 2 gives better result compared to the one without using transfer learning, i.e.,
Model 1 after training both of them for 250 iterations. It is logical too because we are
transferring the knowledge of two different pre-trained models to our proposed network.
Figure 7. Model 1 and Model 2 training error comparison.
It is prevalent from figure 7 that Model 2 converges or generalizes faster compared to Model
1. Model 2 is deeper and more parameters to train and thus harder to train, to be precise
Model 2 has 3,725,717 parameters and Model 1 has 711,682 parameters.
It can be concluded that Model 2 is superior compared to Model 1 and thus using Transfer
learning helps a lot and was a right decision.

7.2 RESULTS FOR CHOOSING THE BEST LOSS FUNCTION OUT OF TWO
Figure 8 shows the True depth map for refrence. From figure 9, 10 and 17 it can be concluded
that the RMSLE loss function converges faster compared to the RMSE loss function for the
same Model. It is logical too because in RMSLE we are taking the ratio of predicted and true
depth to calculate the loss function and as we know it is the property of log function that the
value increases faster if the input is around 1 and when its 1 it outputs 0, where in case of
RMSE its just merely the linear difference of predicted and true depth values. Figure 11 & 12
are for refrence to see the predicted output of model after 750 iterations for both the loss
functions.
Therefore, in case of RMSLE the rate of change and the error itself in the starting is very high
and as the no of iterations increases the rate of change of error decreases because the ratio
closes to 1 and becomes very small after that the error becomes stagnant that can be seen
from the figure 16. Whereas in the case of RMSE the rate of change of error is slow in the
starting as compared to RMSLE and becomes stagnant after enough no of iterations as it can
be seen in figure 15.
As the no of iterations starts increasing and reaches around 1000 or more both of the loss
functions become almost equal, and it is hard to differentiate between them, this can be seen
in figure 13,14,17. It reaches a certain threshold around 1300 and become stagnant later on.

After the loss becomes almost constant, the network starts fine tuning the weights.

Figure 8. True depth image

Figure 9. Predicted depth image Figure 10. Predicted depth image

with RMSE loss with RMSLE loss
Figure 11. Predicted depth image Figure 12. Predicted depth image
. with RMSE loss with RMSLE loss

1000-iterations

Figure 13. Predicted depth image Figure 14. Predicted depth image
with RMSE loss with RMSLE loss

Figure 15. RMSE Loss graph Figure 16. RMSLE loss graph

Figure 17. RMSE and RMSLE loss function plotted together.

For the chosen model, i.e., Model 2, we choose different learning rate and different batch
sizes for 250, 500 and 1000 no of epochs respectively which can be seen from table 1. In the
starting, there is a scope of rapid or higher learning rate as there is a higher error in the
starting which can be concluded from previous discussions. As the no of epochs increased the
rate of change of error becomes small and therefore the learning rate should be less, or we
can say it should decay with no of epochs. We took a small batch and increased the batch size
in steps after some defined no of epochs which can be seen in table 1 because as the error is
higher in the starting so we should update the weights faster to decrease the error faster.

No. of Epochs Learning rate (LR) Batch Size

0-250 0.001 4
250-750 0.001 16
750-1750 0.0001 32

Table 1. Final CNN Model Training Parameters (Model 2)

We can compare the performance of different loss function using Model 2 from table 2.
Model 2 using RMSE yields the smallest Abs. Rel. Difference while Model 2 using RMSLE
yields the smallest RMSE.

Abs. Rel. Difference RMSE

RMSE 0.1594 0.1158

RMSLE 0.1598 0.1156

Table 2. Metric evaluation of different loss function (Model2)

8. CONCLUSION

This research uses convolutional neural networks for dealing with problem of depth
estimation from a single image. Depth estimation from a single camera is ambiguous and
hard to train even with advance CNN's. The transfer learning approach generalized the results
faster. This model explicitly utilizes the transfer learning approach to predict the depth map.
Results of the experiments shown that utilizing transfer learning increases the performance of
the network. This is evident in the qualitative and quantitative assessment of the output depth
maps as shown in section 7. Furthermore, the results in section 7 shows that using the
RMSLE function improves the learning capability of the proposed Model and is suitable in
our case rather than using RMSE. Thus we propose a transfer learning model, i.e., model 2
with RMSLE function for the task of depth estimation from a single image.

Conflict of Interest: There is no conflict of interest between the authors. Also there are no
financial and non-financial interest to disclose.
Data availability: It can be accessed at below mentioned url (data size is around 10GB).
https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html.
References

1. A. Saxena, M. Sun, And A. Y. Ng. Make3d: earning 3-D Scene Structure From A Single Still Image. TPAMI,
2008.

2. D. Hoiem, A. A. Efros, And M. Hebert. Automatic Photo Pop-Up. In ACM SIGGRAPH, Pages 577–584,
2005.

3. Eigen, D.; Puhrsch, C.; Fergus, R.: Depth Map Prediction from a Single Image using a Multi-Scale Deep
Network. CoRR. vol. abs/1406.2283. 2014.

4. http://cs231n.stanford.edu/reports/2017/pdfs/203.pdf

5. J. Michels, A. Saxena, And A. Y. Ng. High-Speed Obstacle Avoidance Using Monocular Vision And
Reinforcement Learning. In ICML, Pages 593–600, 2005.

6. N. Silberman, D. Hoiem, P. Kohli, And R. Fergus. Indoor Segmentation And Support Inference From Rgbd
Images. In ECCV, 2012.

7. R. Hadsell, P. Sermanet, J. Ben, A. Erkan, M. Scoffier, K. Kavukcuoglu, U. Muller, And Y. Le- Cun.
Learning Long-Range Vision For Autonomous Off-Road Driving. Journal Of Field Robotics,
26(2):120–144, 2009.

8. Shotton, J.; Girshick, R.; Fitzgibbon, A.; Et Al.: Decision Forests For Computer Vision And Medical Image
Analysis. Chapter Efficient Human Pose Estimation From Single Depth Images. London: Springer London.
2013. ISBN 978-1-4471-4929-3. Pp. 175–192. Doi:10.1007/978-1-4471-4929-3_13.

JournalPaper ASC Updated
No ratings yet
JournalPaper ASC Updated
16 pages
CNN Depth Estimation Insights
No ratings yet
CNN Depth Estimation Insights
10 pages
Neural Networks for Depth Sensors
No ratings yet
Neural Networks for Depth Sensors
11 pages
Tousi 2020
No ratings yet
Tousi 2020
6 pages
Demon: Depth and Motion Network For Learning Monocular Stereo
No ratings yet
Demon: Depth and Motion Network For Learning Monocular Stereo
22 pages
Atapour-Abarghouei Veritatem Dies Aperit - Temporally Consistent Depth Prediction Enabled by CVPR 2019 Paper PDF
No ratings yet
Atapour-Abarghouei Veritatem Dies Aperit - Temporally Consistent Depth Prediction Enabled by CVPR 2019 Paper PDF
12 pages
Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals
No ratings yet
Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals
13 pages
Monocular Depth Estimation with U-Net
No ratings yet
Monocular Depth Estimation with U-Net
8 pages
Depth Estimation Based On Monocular Camera Sensors in Autonomous Vehicles: A Self Supervised Learning Approach
No ratings yet
Depth Estimation Based On Monocular Camera Sensors in Autonomous Vehicles: A Self Supervised Learning Approach
13 pages
Depth Estimation From A Single Image Using Deep Learned Phase Coded Mask
No ratings yet
Depth Estimation From A Single Image Using Deep Learned Phase Coded Mask
12 pages
Access-2025-23163 Proof Hi
No ratings yet
Access-2025-23163 Proof Hi
14 pages
Basak 2020
No ratings yet
Basak 2020
6 pages
Depth Reconstruction With Deep Neural Networks (Part 2)
No ratings yet
Depth Reconstruction With Deep Neural Networks (Part 2)
54 pages
Atapour-Abarghouei Real-Time Monocular Depth CVPR 2018 Paper PDF
No ratings yet
Atapour-Abarghouei Real-Time Monocular Depth CVPR 2018 Paper PDF
11 pages
A Cost Effective Estimation of Depth From Stereo Image Pairs Using Shallow Siamese Convolutional Networks
No ratings yet
A Cost Effective Estimation of Depth From Stereo Image Pairs Using Shallow Siamese Convolutional Networks
5 pages
08-Monocular Depth Estimation
No ratings yet
08-Monocular Depth Estimation
15 pages
AI Depth Estimation Overview
No ratings yet
AI Depth Estimation Overview
16 pages
Dattnet: Monocular Depth Estimation Network Based On Attention Mechanisms
No ratings yet
Dattnet: Monocular Depth Estimation Network Based On Attention Mechanisms
10 pages
Park Depth Prompting For Sensor-Agnostic Depth Estimation CVPR 2024 Paper
No ratings yet
Park Depth Prompting For Sensor-Agnostic Depth Estimation CVPR 2024 Paper
11 pages
Qi GeoNet Geometric Neural CVPR 2018 Paper
No ratings yet
Qi GeoNet Geometric Neural CVPR 2018 Paper
9 pages
Depth Anything - Unleashing The Power of Large-Scale Unlabeled Data
No ratings yet
Depth Anything - Unleashing The Power of Large-Scale Unlabeled Data
11 pages
Chang Pyramid Stereo Matching CVPR 2018 Paper
No ratings yet
Chang Pyramid Stereo Matching CVPR 2018 Paper
9 pages
Yin Learning To Recover 3D Scene Shape From A Single Image CVPR 2021 Paper
No ratings yet
Yin Learning To Recover 3D Scene Shape From A Single Image CVPR 2021 Paper
10 pages
Dge CNN
No ratings yet
Dge CNN
10 pages
Monocular Depth Estimation Based On Deep Learning: An Overview
No ratings yet
Monocular Depth Estimation Based On Deep Learning: An Overview
14 pages
Improving Structured Light Based Depth and Pose Estimation Using Cnns
No ratings yet
Improving Structured Light Based Depth and Pose Estimation Using Cnns
77 pages
CNN-SLAM: Real-Time Dense Monocular SLAM With Learned Depth Prediction
No ratings yet
CNN-SLAM: Real-Time Dense Monocular SLAM With Learned Depth Prediction
10 pages
CV Sce
No ratings yet
CV Sce
12 pages
Depth Estimation for Vision Experts
No ratings yet
Depth Estimation for Vision Experts
18 pages
Depthanything
No ratings yet
Depthanything
18 pages
Lee 2017
No ratings yet
Lee 2017
5 pages
2024 - Learning Temporally Consistent Video Depth From Video Diffusion Priors - Shao Et Al
No ratings yet
2024 - Learning Temporally Consistent Video Depth From Video Diffusion Priors - Shao Et Al
13 pages
Object Detection and Localization Using Stereo Cameras
No ratings yet
Object Detection and Localization Using Stereo Cameras
6 pages
Zanuttigh 2017
No ratings yet
Zanuttigh 2017
5 pages
Improving Monocular Visual Odometry Using Learned Depth
No ratings yet
Improving Monocular Visual Odometry Using Learned Depth
14 pages
Monovit: Self-Supervised Monocular Depth Estimation With A Vision Transformer
No ratings yet
Monovit: Self-Supervised Monocular Depth Estimation With A Vision Transformer
11 pages
Project Synopsis Template
No ratings yet
Project Synopsis Template
5 pages
Neural RGB D Sensing: Depth and Uncertainty From A Video Camera
No ratings yet
Neural RGB D Sensing: Depth and Uncertainty From A Video Camera
13 pages
Unsupervised Monocular Depth Learning With Integrated Intrinsics and Spatio-Temporal Constraints
No ratings yet
Unsupervised Monocular Depth Learning With Integrated Intrinsics and Spatio-Temporal Constraints
8 pages
2311.07198 Kepentingan
No ratings yet
2311.07198 Kepentingan
10 pages
Unsupervised Scale-Consistent Depth and Ego-Motion Learning From Monocular Video
No ratings yet
Unsupervised Scale-Consistent Depth and Ego-Motion Learning From Monocular Video
11 pages
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
No ratings yet
Sensors: Depth Estimation and Semantic Segmentation From A Single RGB Image Using A Hybrid Convolutional Neural Network
20 pages
Event-Based Monocular Depth Estimation With Recurrent Transformers
No ratings yet
Event-Based Monocular Depth Estimation With Recurrent Transformers
13 pages
Singh 2020
No ratings yet
Singh 2020
5 pages
AdaBins: Adaptive Depth Estimation Model
No ratings yet
AdaBins: Adaptive Depth Estimation Model
13 pages
Multiple Object Recognition
No ratings yet
Multiple Object Recognition
39 pages
Unsupervised Domain Adaptation For Depth Prediction From Images
No ratings yet
Unsupervised Domain Adaptation For Depth Prediction From Images
14 pages
Learning Depth With Convolutional Spatial Propagation Network
No ratings yet
Learning Depth With Convolutional Spatial Propagation Network
18 pages
Stag2024 DDD
No ratings yet
Stag2024 DDD
10 pages
开题报告 Akash 4
No ratings yet
开题报告 Akash 4
1 page
Revisiting Depth Completion From A Stereo Matching Perspective For Cross-Domain Generalization
No ratings yet
Revisiting Depth Completion From A Stereo Matching Perspective For Cross-Domain Generalization
22 pages
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
No ratings yet
Incremental Dense Reconstruction From Monocular Video With Guided Sparse Feature Volume Fusion
8 pages
(Paper 5) Kira2012
No ratings yet
(Paper 5) Kira2012
8 pages
Pano 3 D
No ratings yet
Pano 3 D
21 pages
Lecture 16 Hao
No ratings yet
Lecture 16 Hao
56 pages
Acivs 2017
No ratings yet
Acivs 2017
12 pages
Zusc S 24 00845
No ratings yet
Zusc S 24 00845
15 pages
Fdsafdsfsafasdfbrwa
No ratings yet
Fdsafdsfsafasdfbrwa
14 pages
WELDER (Dual Mode) : Under Dual Training System
No ratings yet
WELDER (Dual Mode) : Under Dual Training System
33 pages
2008 09 PDF
No ratings yet
2008 09 PDF
20 pages
Syllabus For Workshop Calculation and Science
No ratings yet
Syllabus For Workshop Calculation and Science
2 pages
Draughtsman (Mechanical)
No ratings yet
Draughtsman (Mechanical)
28 pages
STANDARD NORMAL DISTRIBUTION: Table Values Represent AREA To The LEFT of The Z Score
No ratings yet
STANDARD NORMAL DISTRIBUTION: Table Values Represent AREA To The LEFT of The Z Score
2 pages
Trade Theory Welder-Syllabus
100% (1)
Trade Theory Welder-Syllabus
4 pages
Trade Theory - Draughtsman-Syllabus
No ratings yet
Trade Theory - Draughtsman-Syllabus
6 pages
Grey-Fuzzy DEMATEL for Service Quality
No ratings yet
Grey-Fuzzy DEMATEL for Service Quality
11 pages
Transportation Research Part E: Dong-Ping Song, Andrew Lyons, Dong Li, Hossein Sharifi
No ratings yet
Transportation Research Part E: Dong-Ping Song, Andrew Lyons, Dong Li, Hossein Sharifi
22 pages
Escalator's Arrangement 1. A Single Escalator Arrangement: August 2016 Semester
No ratings yet
Escalator's Arrangement 1. A Single Escalator Arrangement: August 2016 Semester
10 pages
Simple Heuristic for Fixed-Charge Transport
No ratings yet
Simple Heuristic for Fixed-Charge Transport
7 pages
An Improved Heuristic For The Single Machine Weighted Tardiness Problem 1999 Omega
No ratings yet
An Improved Heuristic For The Single Machine Weighted Tardiness Problem 1999 Omega
11 pages
Omega: Pui-Sze Chow, Yulan Wang, Tsan-Ming Choi, Bin Shen
No ratings yet
Omega: Pui-Sze Chow, Yulan Wang, Tsan-Ming Choi, Bin Shen
13 pages
An Efficient Constructive Heuristic For Flowtime Minimisation in Permutation Flow Shops 2003 Omega
No ratings yet
An Efficient Constructive Heuristic For Flowtime Minimisation in Permutation Flow Shops 2003 Omega
7 pages
A Heuristic Framework For The Bi Objective Enhanced Index Tracking Problem 2016 Omega
No ratings yet
A Heuristic Framework For The Bi Objective Enhanced Index Tracking Problem 2016 Omega
16 pages
A Delay in Payment Contract For Pareto Improvement of A Supply Chain With Stochastic Demand 2014 Omega
No ratings yet
A Delay in Payment Contract For Pareto Improvement of A Supply Chain With Stochastic Demand 2014 Omega
9 pages
A Conceptual Framework For Tackling Knowable Unknown Unknowns in Project Management 2014 Journal of Operations Management
No ratings yet
A Conceptual Framework For Tackling Knowable Unknown Unknowns in Project Management 2014 Journal of Operations Management
15 pages
A Competitive Advantage From The Implementation Timing of ISO Management Standards 2015 Journal of Operations Management
No ratings yet
A Competitive Advantage From The Implementation Timing of ISO Management Standards 2015 Journal of Operations Management
14 pages
Training A Deep Neural Network For Computer Go Player For Next Move Prediction
No ratings yet
Training A Deep Neural Network For Computer Go Player For Next Move Prediction
10 pages
Banino 10096744 Thesis AARAV
No ratings yet
Banino 10096744 Thesis AARAV
162 pages
Deep Learning for Eye Disease Diagnosis
No ratings yet
Deep Learning for Eye Disease Diagnosis
9 pages
Paper 71-Convolutional Neural Network Model
No ratings yet
Paper 71-Convolutional Neural Network Model
10 pages
Liceria & Co.
No ratings yet
Liceria & Co.
16 pages
Final Year Project
No ratings yet
Final Year Project
47 pages
An Introduction To Deep Learning: Understanding The Basics of How (And Why) It Works
No ratings yet
An Introduction To Deep Learning: Understanding The Basics of How (And Why) It Works
32 pages
Deep Learning for Malware Detection
No ratings yet
Deep Learning for Malware Detection
36 pages
Explainable DL SleepApnea2023
No ratings yet
Explainable DL SleepApnea2023
14 pages
Machine Learning Engineer Cheatsheet
No ratings yet
Machine Learning Engineer Cheatsheet
3 pages
Alzheimar
No ratings yet
Alzheimar
14 pages
Deep Learning Essentials for Learners
No ratings yet
Deep Learning Essentials for Learners
74 pages
AI's Impact on Healthcare: Review
No ratings yet
AI's Impact on Healthcare: Review
18 pages
An Intelligent Knowledge Extraction Framework For Recognizing Identification Information From Real-World ID Card Images
No ratings yet
An Intelligent Knowledge Extraction Framework For Recognizing Identification Information From Real-World ID Card Images
10 pages
MULTIPLE - OBJECT - TRACKING - 01fe20bei046
No ratings yet
MULTIPLE - OBJECT - TRACKING - 01fe20bei046
6 pages
Gender Class Fi Cation Model
No ratings yet
Gender Class Fi Cation Model
10 pages
Automated Bank Cheque Processing System
No ratings yet
Automated Bank Cheque Processing System
8 pages
Alphapose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time
No ratings yet
Alphapose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time
17 pages
Sign Language to Text via CNN
No ratings yet
Sign Language to Text via CNN
7 pages
Medical Image Analysis: Yoni Schirris, Efstratios Gavves, Iris Nederlof, Hugo Mark Horlings, Jonas Teuwen
No ratings yet
Medical Image Analysis: Yoni Schirris, Efstratios Gavves, Iris Nederlof, Hugo Mark Horlings, Jonas Teuwen
11 pages
Automated Penetration Testing Using Deep Reinforcement Learning
No ratings yet
Automated Penetration Testing Using Deep Reinforcement Learning
9 pages
Measurement: Sensors: S. Kalaiselvi, G. Thailambal
No ratings yet
Measurement: Sensors: S. Kalaiselvi, G. Thailambal
12 pages
Customer Lifetime Value in Video Games Using Deep Learning and Parametric Models
No ratings yet
Customer Lifetime Value in Video Games Using Deep Learning and Parametric Models
7 pages
Inclusion Classification by Computer Vision and Machine Learning
No ratings yet
Inclusion Classification by Computer Vision and Machine Learning
6 pages
التعلم العميق
No ratings yet
التعلم العميق
192 pages
Physics-Informed Neural Networks (PINNs) For Heat Transfer Problems
No ratings yet
Physics-Informed Neural Networks (PINNs) For Heat Transfer Problems
21 pages
A Survey of Deep Learning-Based Object Detection Methods in Crop Counting
No ratings yet
A Survey of Deep Learning-Based Object Detection Methods in Crop Counting
19 pages
The GR4J Hydrological Model
No ratings yet
The GR4J Hydrological Model
13 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
43 pages
Module 2 Convolutional Neural Network
No ratings yet
Module 2 Convolutional Neural Network
20 pages

Depth Estimation with CNN Transfer

Uploaded by

Depth Estimation with CNN Transfer

Uploaded by

Depth estimation using Convolutional Neural Network with transfer

3.2 CNN with transfer learning

5. Loss functions for training

6. Training and testing

7. Results and discussion

Best of two proposed architectures proposed in section 3 is chosen by training on 32 training

Figure 3. True RGB image

Figure 5. Predicted depth map of Figure 6. Predicted depth map of

Figure 8. True depth image

Figure 9. Predicted depth image Figure 10. Predicted depth image

Figure 17. RMSE and RMSLE loss function plotted together.

No. of Epochs Learning rate (LR) Batch Size

Table 1. Final CNN Model Training Parameters (Model 2)

Abs. Rel. Difference RMSE

RMSE 0.1594 0.1158

RMSLE 0.1598 0.1156

Table 2. Metric evaluation of different loss function (Model2)

You might also like