Abstract
Cross-media retrieval returns heterogeneous multimedia data of the same semantics for a query object, and the key problem for cross-media retrieval is how to deal with the correlations of heterogeneous multimedia data. Many works focus on mapping different modal data into an isomorphic space, so the similarities between different modal data can be measured. Inspired by this idea, we propose a joint graph regularization based modality-dependent cross-media retrieval approach (JGRMDCR), which takes into account the one-to-one correspondence between different modal data pairs, the inter-modality similarities and the intra-modality similarities. Meanwhile, according to the modality of the query object, this method learns different projection matrices for different retrieval tasks. Experimental results on benchmark datasets show that the proposed approach outperforms the other state-of-the-art algorithms.






Similar content being viewed by others
References
André B, Vercauteren T, Buchner AM, Wallace MB, Ayache N (2012) Learning semantic and visual similarity for endomicroscopy video retrieval. IEEE Trans Med Imaging 31(6):1276–1288
Chang X, Nie F, Wang S, Yang Y, Zhou X, Zhang C (2016) Compound rank-k projections for bilinear analysis. IEEE Trans Neural Netw Learn Syst 27(7):1502–1513
Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection, Twenty-Eighth AAAI Conference on Artificial Intelligence, the Twenty-Sixth Innovative Applications of Artificial Intelligence Conference, the Symposium on Educational Advances in Artificial Intelligence, 2, 1171–1177
Chang X, Yang Y (2016) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Trans Neural Netw Learn Syst. doi:10.1109/TNNLS.2016.2582746
Chang X, Yu YL, Yang Y, Xing EP (2016) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell PP(99):1–1. doi:10.1109/TPAMI.2016.2608901
Escalante HJ, Hérnadez CA, Sucar LE, Montes M (2008) Late fusion of heterogeneous methods for multimedia image retrieval. In: ACM Sigmm international conference on multimedia information retrieval, pp 172–179
Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vis 106 (2):210–233
Haiduc S, Bavota G, Marcus A, Oliveto R, Lucia AD, Menzies T (2013) Automatic query reformulations for text retrieval in software engineering. Int Conf Softw Eng 8114:842–851
Hardoon DR, Szedmak S, Shawetaylor J (2004) Canonical correlation analysis: An overview with application to learning methods. Neural Comput 16(12):2639–2664
Hu P, Liu W, Jiang W, Yang Z (2014) Latent topic model for audio retrieval. Pattern Recogn 47(3):1138–1143
Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: State of the art and challenges. ACM Trans Multimed Comput Commun Appl 2(1):1–19
Li D, Dimitrova N, Li M, Sethi IK (2003) Multimedia content processing through cross-modal association. In: Eleventh ACM International conference on multimedia, pp 604–611
Lin W, Lu T, Su F (2012) A novel multi-modal integration and propagation model for cross-media information retrieval. Int Conf Multimed Model 7131:740–749
Liu GH, Yang JY (2013) Content-based image retrieval using color difference histogram. Pattern Recogn 46(1):188–198
Nie X, Yin Y, Liu J, Sun J, Cui C (2017) Comprehensive feature-based robust video fingerprinting using tensor model. IEEE Trans Multimed 19(4):785–796
Peng Y, Zhai X, Zhao Y, Huang X (2015) Semi-supervised cross-media feature learning with unified patch graph regularization. IEEE Trans Circ Syst Video Technol 26(3):1–1
Rasiwasia N, Costa Pereira J, Coviello E, Doyle G, Lanckriet GR, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: International conference on multimedia, pp 251–260
Rasiwasia N, Mahajan D, Mahadevan V, Aggarwal G (2014) Cluster Canonical Correlation Analysis, Aistats, pp 823–831
Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: A discriminative latent space. Comput Vis Pattern Recognit 157:2160–2167
Shehata S, Karray F, Kamel MS (2013) An efficient concept-based retrieval model for enhancing text retrieval quality. Knowl Inf Syst 5(2):411–434
Singha M, Hemachandran K (2012) Content based image retrieval using color and texture. Signal Image Process Int J 3(1):271–273
Song W, Cui Y, Peng Z (2015) A full-text retrieval algorithm for encrypted data in cloud storage applications. In: National CCf conference on natural language processing and Chinese computing, pp 229–241
Sun L, Ji S, Ye J (2011) Canonical correlation analysis for multilabel classification, A least-squares formulation, extensions, and analysis. IEEE Trans Pattern Anal Mach Intell 33(1):194–200
Sun J, Liu X, Wan W, Li J, Zhao D, Zhang H (2016) Video hashing based on appearance and attention features fusion via DBN. Neurocomputing 213:84–94
Virtanen S, Klami A, Kaski S (2011) Bayesian CCA via group sparsity. In: International conference on machine learning, pp 457–464
Vitola CPJ, Sepúlveda J, Martínez JI (2013) Fast content-based audio retrieval algorithm. In: Symposium of signals, images and artificial vision, pp 1–5
Wang Y, Zhang H, Yang F (2017) A weighted sparse neighbourhood-preserving projections for face recognition, IETE J Res, 1–10
Wei Y, Zhao Y, Zhu Z, Wei S, Xiao Y, Feng J, Yan S (2015) Modality-dependent Cross-media Retrieval. ACM Trans Intell Syst Technol 7(4):57
Zhai X, Peng Y, Xiao J (2014) Learning cross-media joint representation with sparse and semisupervised regularization. IEEE Trans Circ Syst Video Technol 24(6):965–978
Zhang H, Cao L, Gao S (2014) A locality correlation preserving support vector machine. Pattern Recogn 47(9):3168–3178
Zhang H, Liu Y, Ma Z (2013) Fusing inherent and external knowledge with nonlinear learning for cross-media retrieval. Neurocomputing 119(16):10–16
Zhang H, Liu X (2012) Cross-media semantics mining based on sparse canonical correlation analysis and relevance feedback. In: Advances in multimedia information processing - PCM 2012. Springer, Berlin Heidelberg, pp 759–768
Zhang H, Lu J (2010) Creating ensembles of classifiers via fuzzy clustering and deflection. Fuzzy Sets Syst 161(13):1790–1802
Zhou P, Du L, Fan M, Shen YD (2015) An LLE based heterogeneous metric learning for cross-media retrieval. In: Proceedings of the 2015 SIAM international conference on data mining, pp 64–72
Acknowledgement
The work is partially supported by the National Natural Science Foundation of China (Nos. 61373081, 61572298, 61402268, 61401260, 61601268), the Key Research and Development Foundation of Shandong Province (No. 2016GGX101009) and the Natural Science Foundation of Shandong China (No.BS2014DX006, ZR2014FM012). We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the TITAN X GPU used for this research.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Yan, J., Zhang, H., Sun, J. et al. Joint graph regularization based modality-dependent cross-media retrieval. Multimed Tools Appl 77, 3009–3027 (2018). https://doi.org/10.1007/s11042-017-4918-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-4918-0