Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking
<p>Illustration of the complementary information between multi-modal images.</p> "> Figure 2
<p>Comparing the model proposed in this paper with other advanced RoI-based RGB-T tracking models when the target undergoes large deformation (the yellow box represents ground truth). We can observe that our MACFT model accurately extracts the features of the target and, due to not being restricted by RoI’s scale, can generate bounding boxes that are closer to the ground truth.</p> "> Figure 3
<p>An overview of the MACFT model, which is divided into three parts, which are used for modal-specific/shared feature extraction, information fusion between modalities, and bounding box regression.</p> "> Figure 4
<p>Schematic diagram of the SSFI module. It consists of 2 CAM modules and N = 6 MAM modules. The CAM modules utilize cross-attention to aggregate features from the visible specific/thermal shared branch and the thermal specific/visible shared branch. The MAM modules further enhance the aggregated features by introducing mixed-attention.</p> "> Figure 5
<p>Evaluation results on the LasHeR dataset; (<b>a</b>) is the precision rate curve, (<b>b</b>) is the success rate curve.</p> "> Figure 6
<p>A visual example of the effectiveness of our proposed tracking method. (<b>a</b>–<b>c</b>) are three challenging sequences selected from the LasHeR dataset, and also show the tracking results of other advanced RGB-T trackers for comparison.</p> "> Figure 7
<p>Comprehensive comparison of tracking success rate and speed on the RGBT234 dataset. Our MACFT model achieves the highest tracking success rate under the premise of ensuring real-time running (33.3 FPS on average).</p> "> Figure 8
<p>Visualization of the attention weights of different branches when one of the modalities is of low quality.</p> ">
Abstract
:1. Introduction
2. Related Work
2.1. RGB Tracking Methods
2.2. RGB-T Tracking Methods
3. Method
3.1. Transformer-Based Complementary Feature Extraction Backbone
3.1.1. Model-Specific Feature Extraction Branch
3.1.2. Modal-Shared Feature Extraction Branch
3.2. Modality Shared-Specific Feature Interaction Module
3.3. Target Localization Regression Network
3.4. Training Method
3.4.1. Training in Stages
3.4.2. Loss Function
3.5. Inference
4. Experiments
4.1. Implementation Details
4.1.1. Experimental Platform Setup
4.1.2. Parameter Details
4.2. Public Dataset Evaluation
4.2.1. Evaluation Metrics
4.2.2. Evaluation on RGBT234 Dataset
4.2.3. Evaluation on LasHeR Dataset
4.2.4. Evaluation on VTUAV Dataset
4.2.5. Evaluation Based on Challenge Attributes
4.2.6. Qualitative Evaluation
4.2.7. Comprehensive Evaluation of Efficiency and Effectiveness
4.3. Visualization
4.4. Ablation Study
4.4.1. Model Pruning Experiments
4.4.2. Parameter Tuning Experiments
4.4.3. Parameter Size and Inference Speed
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
Symbols | Explanations |
Concat | Means to connect two tensors in the same dimension. |
WQ, WK, WV | Refers to the three weight matrices used by the self-attention mechanism in transformer, namely: query, key, and value. |
d | Refers to the dim of WQ and WQ |
softmax | Refers to the normalized exponential function; the function is to make the range of each element mapped between (0, 1). |
KL | Refer to the Kullback–Leibler divergence calculation. |
DR | Refers to operations that use linear layer dimensionality reduction. |
softArgmax | Refers to finding the index corresponding to the maximum value through softmax. |
References
- Xiao, Y.; Yang, M.; Li, C.; Liu, L.; Tang, J. Attribute-Based Progressive Fusion Network for RGBT Tracking. Proc. AAAI Conf. Artif. Intell. 2022, 36, 2831–2838. [Google Scholar] [CrossRef]
- Tang, Z.; Xu, T.; Wu, X.-J. A Survey for Deep RGBT Tracking. arXiv 2022, arXiv:2201.09296. [Google Scholar]
- Zhang, H.; Zhang, L.; Zhuo, L.; Zhang, J. Object Tracking in RGB-T Videos Using Modal-Aware Attention Network and Competitive Learning. Sensors 2020, 20, 393. [Google Scholar] [CrossRef] [Green Version]
- Zhang, P.; Wang, D.; Lu, H.; Yang, X. Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking. Int. J. Comput. Vis. 2021, 129, 2714–2729. [Google Scholar] [CrossRef]
- Zhang, X.; Ye, P.; Peng, S.; Liu, J.; Gong, K.; Xiao, G. SiamFT: An RGB-Infrared Fusion Tracking Method via Fully Convolutional Siamese Networks. IEEE Access 2019, 7, 122122–122133. [Google Scholar] [CrossRef]
- Bhat, G.; Danelljan, M.; Van Gool, L.; Timofte, R. Learning Discriminative Model Prediction for Tracking. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 6181–6190. [Google Scholar]
- Zhu, Y.; Li, C.; Tang, J.; Luo, B.; Wang, L. RGBT Tracking by Trident Fusion Network. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 579–592. [Google Scholar] [CrossRef]
- Nam, H.; Han, B. Learning Multi-domain Convolutional Neural Networks for Visual Tracking. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 4293–4302. [Google Scholar]
- Lu, A.; Li, C.; Yan, Y.; Tang, J.; Luo, B. RGBT Tracking via Multi-Adapter Network with Hierarchical Divergence Loss. IEEE Trans. Image Process. 2021, 30, 5613–5625. [Google Scholar] [CrossRef] [PubMed]
- Long Li, C.; Lu, A.; Hua Zheng, A.; Tu, Z.; Tang, J. Multi-adapter RGBT tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Zhu, Y.; Li, C.; Tang, J.; Luo, B. Quality-Aware Feature Aggregation Network for Robust RGBT Tracking. IEEE Trans. Intell. Veh. 2021, 6, 121–130. [Google Scholar] [CrossRef]
- Lu, A.; Qian, C.; Li, C.; Tang, J.; Wang, L. Duality-Gated Mutual Condition Network for RGBT Tracking. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–14. [Google Scholar] [CrossRef] [PubMed]
- Gao, Y.; Li, C.; Zhu, Y.; Tang, J.; He, T.; Wang, F. Deep Adaptive Fusion Network for High Performance RGBT Tracking. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 91–99. [Google Scholar]
- Zhu, Y.; Li, C.; Luo, B.; Tang, J.; Wang, X. Dense Feature Aggregation and Pruning for RGBT Tracking. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 465–472. [Google Scholar]
- Li, C.; Liu, L.; Lu, A.; Ji, Q.; Tang, J. Challenge-Aware RGBT Tracking. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28 2020, Proceedings, Part XXII 16; Springer: Berlin/Heidelberg, Germany, 2020; pp. 222–237. [Google Scholar]
- Luo, W.; Li, Y.; Urtasun, R.; Zemel, R. Understanding the Effective Receptive Field in Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 2016, 29, 4905–4913. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2010, arXiv:2010.11929. [Google Scholar]
- Danelljan, M.; Bhat, G.; Khan, F.S.; Felsberg, M. Atom: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4660–4669. [Google Scholar]
- Jung, I.; Son, J.; Baek, M.; Han, B. Real-Time MDNet. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 83–98. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Li, B.; Yan, J.; Wu, W.; Zhu, Z.; Hu, X. High Performance Visual Tracking with Siamese Region Proposal Network. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 8971–8980. [Google Scholar] [CrossRef]
- Bertinetto, L.; Valmadre, J.; Henriques, J.F.; Vedaldi, A.; Torr, P.H.S. Fully-Convolutional Siamese Networks for Object Tracking. In Computer Vision—ECCV 2016 Workshops; Hua, G., Jégou, H., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2016; Volume 9914, pp. 850–865. [Google Scholar] [CrossRef] [Green Version]
- Li, B.; Wu, W.; Wang, Q.; Zhang, F.; Xing, J.; Yan, J. SiamRPN++: Evolution of Siamese Visual Tracking with Very Deep Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 4277–4286. [Google Scholar] [CrossRef] [Green Version]
- Chen, X.; Yan, B.; Zhu, J.; Wang, D.; Yang, X.; Lu, H. Transformer Tracking. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 8122–8131. [Google Scholar] [CrossRef]
- Yan, B.; Peng, H.; Fu, J.; Wang, D.; Lu, H. Learning Spatio-Temporal Transformer for Visual Tracking. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 10428–10437. [Google Scholar] [CrossRef]
- Cui, Y.; Jiang, C.; Wang, L.; Wu, G. MixFormer: End-to-End Tracking with Iterative Mixed Attention. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 13598–13608. [Google Scholar] [CrossRef]
- Zhou, Y.; Kumar, A.; Parkash, C.; Vashishtha, G.; Tang, H.; Xiang, J. A novel entropy-based sparsity measure for prognosis of bearing defects and development of a sparsogram to select sensitive filtering band of an axial piston pump. Measurement 2022, 203, 111997. [Google Scholar] [CrossRef]
- Zheng, G.; Chen, W.; Qian, Q.; Kumar, A.; Sun, W.; Zhou, Y. TCM in milling processes based on attention mechanism-combined long short-term memory using a sound sensor under different working conditions. Int. J. Hydromechatron. 2022, 5, 243–259. [Google Scholar] [CrossRef]
- Chen, B.; Li, P.; Bai, L.; Qiao, L.; Shen, Q.; Li, B.; Gan, W.; Wu, W.; Ouyang, W. Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27 2022, Proceedings, Part XXII; Springer: Berlin/Heidelberg, Germany, 2022; pp. 375–392. [Google Scholar]
- Wang, Y.; Li, C.; Tang, J. Learning soft-consistent correlation filters for RGB-T object tracking. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV); Springer: Berlin/Heidelberg, Germany, 2018; pp. 295–306. [Google Scholar]
- Zhang, P.; Zhao, J.; Bo, C.; Wang, D.; Lu, H.; Yang, X. Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking. IEEE Trans. Image Process. 2021, 30, 3335–3347. [Google Scholar] [CrossRef] [PubMed]
- Danelljan, M.; Bhat, G.; Khan, F.S.; Felsberg, M. ECO: Efficient Convolution Operators for Tracking. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 6931–6939. [Google Scholar] [CrossRef] [Green Version]
- Zhang, T.; Liu, X.; Zhang, Q.; Han, J. SiamCDA: Complementarity-and distractor-aware RGB-T tracking based on Siamese network. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 1403–1417. [Google Scholar] [CrossRef]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective kernel networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
- Zhang, P.; Zhao, J.; Wang, D.; Lu, H.; Ruan, X. Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 8876–8885. [Google Scholar] [CrossRef]
- Zhang, L.; Danelljan, M.; Gonzalez-Garcia, A.; van de Weijer, J.; Shahbaz Khan, F. Multi-Modal Fusion for End-to-End RGB-T Tracking. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 2252–2261. [Google Scholar] [CrossRef] [Green Version]
- Zhu, J.; Lai, S.; Chen, X.; Wang, D.; Lu, H. Visual prompt multi-modal tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, CAN, 18–22 June 2023; pp. 9516–9526. [Google Scholar]
- Jia, M.; Tang, L.; Chen, B.-C.; Cardie, C.; Belongie, S.; Hariharan, B.; Lim, S.-N. Visual prompt tuning. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 709–727. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, Virtual Event. 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- Li, C.; Liang, X.; Lu, Y.; Zhao, N.; Tang, J. RGB-T Object Tracking: Benchmark and Baseline. Pattern Recognit. 2019, 96, 106977. [Google Scholar] [CrossRef] [Green Version]
- Li, C.; Xue, W.; Jia, Y.; Qu, Z.; Luo, B.; Tang, J.; Sun, D. LasHeR: A Large-scale High-diversity Benchmark for RGBT Tracking. IEEE Trans. Image Process. 2022, 31, 392–404. [Google Scholar] [CrossRef] [PubMed]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- Huang, L.; Zhao, X.; Huang, K. GlobalTrack: A Simple and Strong Baseline for Long-Term Tracking. Proc. AAAI Conf. Artif. Intell. 2020, 34, 11037–11044. [Google Scholar] [CrossRef]
- Kristan, M.; Matas, J.; Leonardis, A.; Felsberg, M.; Pflugfelder, R.; Kamarainen, J.-K.; Cehovin Zajc, L.; Drbohlav, O.; Lukezic, A.; Berg, A.; et al. The Seventh Visual Object Tracking VOT2019 Challenge Results. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 2206–2241. [Google Scholar]
APFNet | TFNet | DAFNet | DMCNet | MANet++ | SiamCDA | ViPT | JMMAC | MaCNet | MACFT | |
---|---|---|---|---|---|---|---|---|---|---|
PR (↑) | 0.827 | 0.806 | 0.796 | 0.839 | 0.795 | 0.760 | 0.835 | 0.790 | 0.790 | 0.857 |
SR (↑) | 0.579 | 0.560 | 0.544 | 0.593 | 0.559 | 0.569 | 0.617 | 0.573 | 0.554 | 0.622 |
Trackers | PR | SR | FPS |
---|---|---|---|
MACFT | 80.1% | 66.8% | 31.7 |
HMFT [35] | 75.8% | 62.7% | 30.2 |
ADRNet [4] | 62.2% | 46.6% | 25.0 |
mfDiMP [36] | 67.3% | 55.4% | 28.0 |
DAFNet [13] | 62.0% | 45.8% | 21.0 |
FSRPN [45] | 65.3% | 54.4% | 30.3 |
Trackers | PR | SR | FPS |
---|---|---|---|
MACFT | 54.1% | 46.7% | 31.7 |
HMFT-LT [35] | 53.6% | 46.1% | 8.1 |
HMFT [35] | 41.4% | 35.5% | 30.2 |
ADRNet [4] | 17.5% | 25.3% | 25.0 |
mfDiMP [36] | 31.5% | 27.2% | 28.0 |
DAFNet [13] | 25.3% | 18.8% | 21.0 |
FSRPN [45] | 36.6% | 31.4% | 30.3 |
DMCNet | ViPT | APFNet | TFNet | MaCNet | DAFNet | JMMAC | MACFT | |
---|---|---|---|---|---|---|---|---|
NO | 93.7/70.6 | 94.5/74.2 | 96.0/70.5 | 92.1/69.0 | 94.0/68.4 | 83.6/60.0 | 91.6/70.2 | 97.3/73.0 |
PO | 89.9/63.0 | 84.4/63.3 | 85.8/60.4 | 80.8/56.7 | 81.1/56.5 | 80.1/55.9 | 79.9/59.4 | 85.7/63.9 |
HO | 71.5/51.2 | 77.1/57.2 | 75.0/51.6 | 68.7/46.6 | 71.9/48.5 | 65.6/42.8 | 67.8/47.6 | 77.8/56.1 |
LI | 85.3/59.2 | 79.1/57.2 | 86.8/58.5 | 81.5/54.8 | 81.4/54.0 | 82.5/55.1 | 88.4/61.8 | 82.7/58.3 |
LR | 83.2/58.1 | 84.6/61.3 | 83.8/57.3 | 81.0/53.8 | 77.9/53.1 | 77.6/51.9 | 75.3/51.4 | 84.2/58.7 |
TC | 87.0/61.9 | 88.9/66.2 | 84.4/59.1 | 80.1/57.2 | 81.5/59.0 | 81.8/57.5 | 72.7/50.4 | 82.7/60.7 |
DEF | 75.4/54.9 | 81.3/63.4 | 79.4/56.8 | 71.4/50.9 | 73.5/51.4 | 65.1/44.2 | 68.0/51.2 | 83.4/63.8 |
FM | 81.0/53.6 | 89.7/64.8 | 86.6/55.6 | 76.1/47.4 | 80.9/49.2 | 65.6/41.5 | 68.0/36.6 | 85.0/59.6 |
SC | 82.3/59.8 | 82.6/62.9 | 83.7/58.5 | 77.6/56.0 | 78.3/55.7 | 73.7/50.9 | 81.0/60.6 | 86.1/64.1 |
MB | 79.4/60.3 | 86.8/67.7 | 80.1/60.2 | 71.2/53.3 | 75.8/57.4 | 62.9/46.1 | 74.3/56.7 | 86.9/65.5 |
CM | 81.2/59.4 | 85.4/65.0 | 81.0/59.1 | 76.0/54.1 | 75.9/54.2 | 68.1/47.0 | 74.2/55.0 | 88.4/65.2 |
BC | 82.1/54.7 | 83.7/59.1 | 81.0/54.1 | 80.2/50.7 | 79.6/49.0 | 78.3/45.7 | 63.8/44.3 | 85.0/57.9 |
ALL | 83.9/59.3 | 83.5/61.7 | 82.7/57.9 | 80.6/56.0 | 79.0/55.4 | 79.6/54.4 | 79.0/57.3 | 85.7/62.2 |
Trackers | LasHeR | RGBT234 | ||
---|---|---|---|---|
PR (↓) | SR (↓) | PR (↓) | SR (↓) | |
MACFT (B-T) | 42.8% (−22.5%) | 33.2% (−18.2%) | 67.4% (−18.3%) | 48.8% (13.4%) |
MACFT (B-RGB) | 57.8% (−7.5%) | 46.4% (−5.0%) | 80.0% (−5.7%) | 58.2% (−4.0%) |
MACFT (DM) | 62.2% (−3.1%) | 48.7% (−2.7%) | 83.7% (−2.0%) | 58.9% (−3.3%) |
MACFT (DM + CAM) | 63.8% (−1.5%) | 50.3% (−1.1%) | 83.7% (−2.0%) | 60.7% (−1.5%) |
MACFT (DM + MAM) | 63.9% (−1.4%) | 50.2% (−1.2%) | 83.9% (−1.8%) | 60.7% (−1.5%) |
MACFT (DM + CAM + COM) | 64.3% (−1.0%) | 50.7% (−0.7%) | 85.3% (−0.4%) | 61.6% (−0.6%) |
MACFT (w/o-FT) | 64.3% (−1.0%) | 50.8% (−0.6%) | 84.0% (−1.7%) | 60.9% (−1.3%) |
MACFT | 65.3% | 51.4% | 85.7% | 62.2% |
Number of MAM Blocks | PR | SR |
---|---|---|
4 | 64.80% | 51.00% |
5 | 64.90% | 51.20% |
6 | 65.30% | 51.40% |
7 | 65.00% | 51.20% |
8 | 64.80% | 51.10% |
Trackers | Speed (FPS) | Params * | SR |
---|---|---|---|
MACFT | 33.3 | 59.6 M | 65.3% |
MACFT (DM + CAM + COM) | 35.5 | 59.6 M | 64.3% |
MACFT (DM + CAM) | 50.5 | 45.4 M | 63.8% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Luo, Y.; Guo, X.; Dong, M.; Yu, J. Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking. Sensors 2023, 23, 6609. https://doi.org/10.3390/s23146609
Luo Y, Guo X, Dong M, Yu J. Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking. Sensors. 2023; 23(14):6609. https://doi.org/10.3390/s23146609
Chicago/Turabian StyleLuo, Yang, Xiqing Guo, Mingtao Dong, and Jin Yu. 2023. "Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking" Sensors 23, no. 14: 6609. https://doi.org/10.3390/s23146609