A Study on the Detection of Cattle in UAV Images Using Deep Learning
<p>Examples of well-lit (<b>left</b>) and too dark conditions (<b>right</b>).</p> "> Figure 2
<p>Examples of blurry (<b>left</b>) and too bright conditions (<b>right</b>).</p> "> Figure 3
<p>Examples of lush pasture (<b>left</b>), dry pasture (<b>center</b>) and exposed soil (<b>right</b>).</p> "> Figure 4
<p>Examples of animal oclusions: tree branches (<b>left</b>), shed roof (<b>middle</b>), electrical wires (<b>right</b>).</p> "> Figure 5
<p>Examples of image blocks containing carefully framed animals (<b>top</b>) and image blocks generated using a regular grid (<b>bottom</b>).</p> "> Figure 6
<p>Workflow used to train all models considered in the experiments.</p> "> Figure 7
<p>Range of accuracies obtained for each CNN architecture. The three bars associated to each architecture correspond to input sizes of 224 × 224 (<b>left</b>), 112 × 112 (<b>middle</b>) and 56 × 56 (<b>right</b>). The circle in each bar represents the average accuracy, and the bottom and top extremities represent the lowest and highest accuracies observed during the application of the 10-fold cross-validation.</p> "> Figure 8
<p>Examples of the two possible approaches to simulate coarser ground sample distances (GSDs). In the left, the entire image is downsampled and then the regular grid is applied; in the right, the grid is first applied and then image blocks are downsampled.</p> "> Figure 9
<p>Examples of “cattle” blocks with very small animal parts visible (red elipses).</p> ">
Abstract
:1. Introduction
- All experiments were done with animals of the Nelore (Bos indicus) and Canchim breed. The Canchim breed is a cross between Charolais (Bos taurus) and Nelore breeds, with the latter lending most of its visual traits. To the authors’ knowledge, there are no studies in the literature using aerial images of either Nelore or Canchim breeds. Because visual differences between both breeds were slight in most cases, breed identification using the CNNs was not attempted.
- Some experiments were designed specifically to determine the ideal GSD when CNNs are used, taking into consideration both accuracy and area covered.
- Fifteen of the most successful CNN architectures were compared, using a 10-fold cross-validation procedure to avoid any spurious or unrealistic results.
- The image dataset used in the experiments include images captured under a wide variety of conditions. To guarantee as much data variability as possible, images were captured under different weather conditions (sunny and overcast), at different times of the day and of the year, and with different pasture conditions. Each one of those factors is carefully analyzed and discussed, thus qualifying the results observed.
2. Materials and Methods
2.1. Image Dataset
2.2. Experimental Setup
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Barbedo, J.G.A.; Koenigkan, L.V. Perspectives on the use of unmanned aerial systems to monitor cattle. Outlook Agric. 2018, 47, 214–222. [Google Scholar] [CrossRef] [Green Version]
- Barbedo, J.G.A. A Review on the Use of Unmanned Aerial Vehicles and Imaging Sensors for Monitoring and Assessing Plant Stresses. Drones 2019, 3, 40. [Google Scholar] [CrossRef] [Green Version]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronge. arXiv 2016, arXiv:1612.08242. [Google Scholar]
- Kellenberger, B.; Marcos, D.; Tuia, D. Detecting mammals in UAV images: Best practices to address a substantially imbalanced dataset with deep learning. Remote Sens. Environ. 2018, 216, 139–153. [Google Scholar] [CrossRef] [Green Version]
- Chrétien, L.P.; Théau, J.; Ménard, P. Visible and thermal infrared remote sensing for the detection of white-tailed deer using an unmanned aerial system. Tools Technol. 2016, 40, 181–191. [Google Scholar] [CrossRef]
- Franke, U.; Goll, B.; Hohmann, U.; Heurich, M. Aerial ungulate surveys with a combination of infrared and high-resolution natural colour images. Anim. Biodivers. Conserv. 2012, 35, 285–293. [Google Scholar]
- Witczuk, J.; Pagacz, S.; Zmarz, A.; Cypel, M. Exploring the feasibility of unmanned aerial vehicles and thermal imaging for ungulate surveys in forests - preliminary results. Int. J. Remote Sens. 2017. [Google Scholar] [CrossRef]
- Lhoest, S.; Linchant, J.; Quevauvillers, S.; Vermeulen, C.; Lejeune, P. How many hippos (HOMHIP): algorithm for automatic counts of animals with infra-red thermal imagery from UAV. Int. Arch. Photogramm.Remote Sens. Spatial Inf. Sci. 2015, XL-3/W3, 355–362. [Google Scholar] [CrossRef] [Green Version]
- Mulero-Pázmány, M.; Stolper, R.; Essen, L.; Negro, J.J.; Sassen, T. Remotely Piloted Aircraft Systems as a Rhinoceros Anti-Poaching Tool in Africa. PLoS ONE 2014, 9, e83873. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Vermeulen, C.; Lejeune, P.; Lisein, J.; Sawadogo, P.; Bouche, P. Unmanned Aerial Survey of Elephants. PLoS ONE 2013, 8, e54700. [Google Scholar] [CrossRef] [Green Version]
- Chamoso, P.; Raveane, W.; Parra, V.; González, A. UAVs Applied to the Counting and Monitoring of Animals. Advances in Intelligent Systems and Computing. Adv. Intell. Syst. Comput. 2014, 291, 71–80. [Google Scholar]
- Longmore, S.; Collins, R.; Pfeifer, S.; Fox, S.; Mulero-Pázmány, M.; Bezombes, F.; Goodwin, A.; Juan Ovelar, M.; Knapen, J.; Wich, S. Adapting astronomical source detection software to help detect animals in thermal images obtained by unmanned aerial systems. Int. J. Remote Sens. 2017, 38, 2623–2638. [Google Scholar] [CrossRef]
- Rivas, A.; Chamoso, P.; González-Briones, A.; Corchado, J. Detection of Cattle Using Drones and Convolutional Neural Networks. Sensors 2018, 18, 2048. [Google Scholar] [CrossRef] [Green Version]
- Rahnemoonfar, M.; Dobbs, D.; Yari, M.; Starek, M. DisCountNet: Discriminating and Counting Network for Real-Time Counting and Localization of Sparse Objects in High-Resolution UAV Imagery. Remote Sens. 2019, 11, 1128. [Google Scholar] [CrossRef] [Green Version]
- Shao, W.; Kawakami, R.; Yoshihashi, R.; You, S.; Kawase, H.; Naemura, T. Cattle detection and counting in UAV images based on convolutional neural networks. Int. J. Remote Sens. 2020, 41, 31–52. [Google Scholar] [CrossRef] [Green Version]
- Jung, S.; Ariyur, K.B. Strategic Cattle Roundup using Multiple Quadrotor UAVs. Int. J. Aeronaut. Space Sci. 2017, 18, 315–326. [Google Scholar] [CrossRef]
- Nyamuryekung’e, S.; Cibils, A.; Estell, R.; Gonzalez, A. Use of an Unmanned Aerial Vehicle—Mounted Video Camera to Assess Feeding Behavior of Raramuri Criollo Cows. Rangel. Ecol. Manag. 2016, 69, 386–389. [Google Scholar] [CrossRef]
- Andrew, W.; Greatwood, C.; Burghardt, T. Aerial Animal Biometrics: Individual Friesian Cattle Recovery and Visual Identification via an Autonomous UAV with Onboard Deep Inference. arXiv 2019, arXiv:1907.05310v1. [Google Scholar]
- Webb, P.; Mehlhorn, S.A.; Smartt, P. Developing Protocols for Using a UAV to Monitor Herd Health. In Proceedings of the 2017 ASABE Annual International Meeting, Spokane, WA, USA, 16–19 July 2017; p. 1700865. [Google Scholar]
- Bock, C.; Poole, G.; Parker, P.; Gottwald, T. Plant disease severity estimated visually, by digital photography and image analysis, and by hyperspectral imaging. Crit. Rev. Plant Sci. 2010, 29, 59–107. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556v6. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
- Howard, A.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv 2019, arXiv:1801.04381v4. [Google Scholar]
- Huang, G.; Liu, Z.; Maaten, L.; Weinberger, K. Densely Connected Convolutional Networks. arXiv 2018, arXiv:1608.06993v5. [Google Scholar]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv 2017, arXiv:1610.02357v3. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. arXiv 2015, arXiv:1512.00567. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. arXiv 2016, arXiv:1602.07261v2. [Google Scholar]
- Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q. Learning Transferable Architectures for Scalable Image Recognition. arXiv 2018, arXiv:1707.07012v4. [Google Scholar]
CNN Architecture | Required Input Size | Reference |
---|---|---|
VGG-16 | None | Simonyan and Zisserman [26] |
VGG-19 | None | Simonyan and Zisserman [26] |
ResNet-50 v2 | None | He et al. [27] |
ResNet-101 v2 | None | He et al. [27] |
ResNet-152 v2 | None | He et al. [27] |
MobileNet | None | Howard et al. [28] |
MobileNet v2 | None | Sandler et al. [29] |
DenseNet 121 | None | Huang et al. [30] |
DenseNet 169 | None | Huang et al. [30] |
DenseNet 201 | None | Huang et al. [30] |
Xception | ≥75 × 75 pixels | Chollet et al. [31] |
Inception v3 | ≥75 × 75 pixels | Szegedy et al. [32] |
Inception ResNet v2 | ≥75 × 75 pixels | Szegedy et al. [33] |
NASNet Mobile | 224 × 224 pixels | Zoph et al. [34] |
NASNet Large | 331 × 331 pixels | Zoph et al. [34] |
CNN | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
VGG-16 | 0.972 | 0.973 | 0.973 | 0.970 |
VGG-19 | 0.973 | 0.973 | 0.973 | 0.975 |
ResNet-50 v2 | 0.977 | 0.978 | 0.978 | 0.975 |
ResNet-101 v2 | 0.983 | 0.985 | 0.985 | 0.985 |
ResNet-152 v2 | 0.967 | 0.970 | 0.970 | 0.965 |
MobileNet | 0.983 | 0.980 | 0.980 | 0.983 |
MobileNet v2 | 0.787 | 0.855 | 0.790 | 0.778 |
DenseNet 121 | 0.852 | 0.895 | 0.868 | 0.865 |
DenseNet 169 | 0.935 | 0.943 | 0.933 | 0.935 |
DenseNet 201 | 0.935 | 0.945 | 0.938 | 0.938 |
Xception | 0.969 | 0.968 | 0.968 | 0.968 |
Inception v3 | 0.979 | 0.975 | 0.975 | 0.975 |
Inception ResNet v2 | 0.983 | 0.983 | 0.983 | 0.985 |
NASNet Mobile | 0.857 | 0.890 | 0.858 | 0.853 |
NASNet Large | 0.992 | 0.993 | 0.993 | 0.995 |
CNN | Input Size | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|---|
224 × 224 | 0.892 | 0.903 | 0.895 | 0.893 | |
VGG-16 | 112 × 112 | 0.911 | 0.918 | 0.915 | 0.910 |
56 × 56 | 0.878 | 0.888 | 0.880 | 0.878 | |
224 × 224 | 0.891 | 0.900 | 0.893 | 0.890 | |
VGG-19 | 112 × 112 | 0.910 | 0.915 | 0.910 | 0.908 |
56 × 56 | 0.888 | 0.898 | 0.885 | 0.885 | |
224 × 224 | 0.848 | 0.883 | 0.848 | 0.845 | |
ResNet-50 v2 | 112 × 112 | 0.939 | 0.940 | 0.943 | 0.940 |
56×56 | 0.796 | 0.815 | 0.793 | 0.788 | |
224 × 224 | 0.893 | 0.910 | 0.895 | 0.890 | |
ResNet-101 v2 | 112 × 112 | 0.908 | 0.913 | 0.908 | 0.910 |
56 × 56 | 0.848 | 0.875 | 0.850 | 0.845 | |
224 × 224 | 0.827 | 0.868 | 0.820 | 0.815 | |
ResNet-152 v2 | 112 × 112 | 0.914 | 0.890 | 0.885 | 0.890 |
56 × 56 | 0.832 | 0.865 | 0.835 | 0.830 | |
224 × 224 | 0.870 | 0.898 | 0.870 | 0.868 | |
MobileNet | 112 × 112 | 0.937 | 0.943 | 0.938 | 0.938 |
56 × 56 | 0.464 | 0.508 | 0.485 | 0.450 | |
224 × 224 | 0.643 | 0.793 | 0.648 | 0.595 | |
MobileNet v2 | 112 × 112 | 0.852 | 0.888 | 0.855 | 0.853 |
56 × 56 | 0.705 | 0.755 | 0.705 | 0.690 | |
224 × 224 | 0.760 | 0.840 | 0.773 | 0.755 | |
DenseNet 121 | 112 × 112 | 0.814 | 0.865 | 0.820 | 0.813 |
56 × 56 | 0.553 | 0.763 | 0.555 | 0.448 | |
224 × 224 | 0.828 | 0.873 | 0.828 | 0.823 | |
DenseNet 169 | 112 × 112 | 0.880 | 0.900 | 0.878 | 0.875 |
56 × 56 | 0.684 | 0.778 | 0.688 | 0.658 | |
224 × 224 | 0.867 | 0.898 | 0.873 | 0.868 | |
DenseNet 201 | 112 × 112 | 0.912 | 0.923 | 0.913 | 0.910 |
56 × 56 | 0.596 | 0.773 | 0.598 | 0.518 | |
224 × 224 | 0.946 | 0.948 | 0.943 | 0.943 | |
Xception | 112 × 112 | 0.955 | 0.953 | 0.953 | 0.955 |
56 × 56 | 0.855 | 0.865 | 0.858 | 0.855 | |
224 × 224 | 0.908 | 0.908 | 0.910 | 0.908 | |
Inception v3 | 112 × 112 | 0.842 | 0.843 | 0.838 | 0.838 |
56 × 56 | 0.565 | 0.740 | 0.703 | 0.648 | |
224 × 224 | 0.925 | 0.933 | 0.925 | 0.923 | |
Inception ResNet v2 | 112 × 112 | 0.891 | 0.890 | 0.890 | 0.890 |
56 × 56 | 0.900 | 0.898 | 0.895 | 0.890 | |
224 × 224 | 0.880 | 0.905 | 0.883 | 0.880 | |
NASNet Mobile | 112 × 112 | 0.851 | 0.885 | 0.855 | 0.850 |
56 × 56 | 0.825 | 0.858 | 0.818 | 0.810 | |
224 × 224 | 0.958 | 0.963 | 0.960 | 0.958 | |
NASNet Large | 112 × 112 | 0.962 | 0.963 | 0.963 | 0.963 |
56 × 56 | 0.964 | 0.965 | 0.965 | 0.965 |
CNN Architecture | 224 × 224 | 112 × 112 | 56 × 56 |
---|---|---|---|
VGG-16 | 65 | 14 | 6 |
VGG-19 | 74 | 17 | 6 |
ResNet-50 v2 | 41 | 11 | 7 |
ResNet-101 v2 | 68 | 19 | 8 |
ResNet-152 v2 | 94 | 27 | 12 |
MobileNet | 18 | 8 | 7 |
MobileNet v2 | 20 | 8 | 7 |
DenseNet 121 | 56 | 15 | 9 |
DenseNet 169 | 65 | 18 | 10 |
DenseNet 201 | 81 | 22 | 11 |
Xception | 60 | 13 | - |
Inception v3 | 29 | 9 | - |
Inception ResNet v2 | 72 | 19 | - |
NASNet Mobile | 35 | - | - |
NASNet Large | 334 * | - | - |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Barbedo, J.G.A.; Koenigkan, L.V.; Santos, T.T.; Santos, P.M. A Study on the Detection of Cattle in UAV Images Using Deep Learning. Sensors 2019, 19, 5436. https://doi.org/10.3390/s19245436
Barbedo JGA, Koenigkan LV, Santos TT, Santos PM. A Study on the Detection of Cattle in UAV Images Using Deep Learning. Sensors. 2019; 19(24):5436. https://doi.org/10.3390/s19245436
Chicago/Turabian StyleBarbedo, Jayme Garcia Arnal, Luciano Vieira Koenigkan, Thiago Teixeira Santos, and Patrícia Menezes Santos. 2019. "A Study on the Detection of Cattle in UAV Images Using Deep Learning" Sensors 19, no. 24: 5436. https://doi.org/10.3390/s19245436
APA StyleBarbedo, J. G. A., Koenigkan, L. V., Santos, T. T., & Santos, P. M. (2019). A Study on the Detection of Cattle in UAV Images Using Deep Learning. Sensors, 19(24), 5436. https://doi.org/10.3390/s19245436