Abstract
In the wake of the development of first-person engagement and crowdsourcing content creation, images are given abundant subjective dimensions of information, especially emotional ones. This research tried to purpose an approach for the image emotional semantic retrieval based on cognitive quantification model by using tags. In this research “Daqi”, a typical Chinese emotional experience, is taken as an example to construct an emotional quantification model of it through semantic association analysis and statistical data analysis. The results of verification experiments indicated that it is practical and effective to rank images and recommend tags in image emotional retrieval system based on cognitive model. It is foreseeable that the theory of this research can be applied to other social digital resources, like music or video.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
With the rapid development of Web2.0, first-person engagement and crowdsourcing content creation have boomed as new paradigms of interactions. Digital media resources, especially images, are given abundant subjective and expressive dimensions of cognitive contents, which advocates researches for retrieval of the emotional information. Human perception and understanding of image emotional information are operations mainly on the semantic level. However, “Semantic gap” between low-level image features and high-level emotional semantic can be hardly to bridge completely.
This research aims to purpose an approach for the image retrieval of emotional content based on tags. Tags are image description added by users directly, so image emotional semantic retrieval can be implemented based on text retrieval technology without the extraction of image information. To increase the number of tags, the channels of tag generation are expanded by including the relevant user interactive behaviors.
In this research, a cognitive quantification model of their emotional qualities or of their reception by users is constructed to organize and manipulate social image resources. Meanwhile, the model is applied to emotional semantic tag recommendation, which is beneficial to improve the efficiency of image annotation and the validity of image recommendation.
This research creatively proposes a mechanism for the emotional semantic retrieval of images. The mechanism has the following advantages,
-
Retaining the user’s subjective view of images maximally by using user-generated tags, which ensures the credibility of retrieval and is more lightweight.
-
Binding the behaviors and views of users in image retrieval based on behavior psychology, which expands tag sources and provides more data for the modeling.
-
Mining the potential emotional semantic of pictures based on the cognitive quantification model, which improve the effectiveness of image emotional semantic retrieval. Meanwhile diverse associated tags can be recommended according to the relevant weight in the model, which further improves the emotional image annotation.
The rest of the paper is organized as follows. An overview of the tag-based image emotional semantic image retrieval mechanism is presented in Sect. 2. Section 3 discussed the details of methods used in the mechanism. To verify the rationality of the proposed mechanism, experiments with small sample size have been done. The process and analyses are presented in Sect. 4. Section 5 is the summary and prospect.
2 Concepts and Methods
2.1 Concepts
Image Tag.
Tags are the keywords added by users to describe the image contents. In particular, tags are not only the labels, but also can be the keywords in the titles, comments and so on. As tags are image descriptor, it is easy to recommend images directly based on text retrieval technology and there is no need to extract and analyze information of images.
An image can be added multiple tags, and a tag is also used to describe multiple images. The user is the creator of annotation behaviors, creating an association between images and tags.
At present, annotation behaviors are as follows,
-
Adding the title or labels when uploading images;
-
Adding the labels or grouping when collecting images;
-
Making comments on images.
The tags generated by the above behaviors are “explicit tags”. In fact, the above annotation behaviors are non-essential and costly behaviors for users, who is lacking of motivation. A large portion of the users only view images without leaving a tag.
However, studies have shown that the behaviors of users to retrieve images can reflect how much users agree with the retrieval results, revealing the relevance between the retrieval keywords and images. Using the data of users’ browsing behaviors, “implicit tags” can be made. The details are discussed in Sect. 2.2.
Emotional Quantification Model.
Image semantics has several levels. Emotional semantics lies on the highest level of abstract semantics, which can be defined as the semantics described intensity and type of feelings, moods, affections or sensibility evoked in humans by viewing images. It is usually represented in adjective form, romantic, brilliant etc.
Constructing an image emotional computational model usually involves three parts,
-
extracting image perceptual features that can stimulate users’ emotions;
-
establishing the emotional recognition mechanism to bridge the semantic gap between low-level visual features and high-level emotional semantics;
-
constructing the model to represent image emotional semantics that meet the needs of users’ query.
Visual identity and machine learning are main methods in the first and the second parts. They are aimed to build an association between images and their emotional semantics that can be retrieved easily, which can be implemented just by “tags”.
Based on tags, this paper is focused on the construction of the model to quantify image emotions which users search for.
In general emotional semantic models, the specific emotion is split and associated with the six basic dimensions of the emotion, anger, disgust, fear, joy, sadness, and surprise. The models relying on these six basic dimensions are not enough in emotional fine grain to represent complex emotional semantics or to distinguish between the various emotions clearly.
Learning from this idea, in this paper the emotional semantics are represented by more flexible and more targeted “emotional dimensions” which contain a variety of “emotional elements” associated with a certain relationship. The emotional semantic quantification model is expressed by “emotional dimensions”, and the emotional dimension is extracted from “emotional elements”.
2.2 Methods
Users’ Retrieval Behavior.
As mentioned in the previous section, it is non-essential and costly for users to add image tags and most users only browse images without leaving a tag. Studies have shown that users’ behaviors to retrieve images can reflect the degree of users’ recognition on the search results, in the image retrieval system, that is, the relevance of the search keywords and images.
Using the data of users’ retrieval behaviors, it can be predicted whether the images are associated with the search keyword, and if so, the “search keyword” can be added to the images as an “implicit tag”.
In users’ retrieving images, the operations of generating “implicit tags” are as follows.
-
click to view the image after retrieving;
-
download/save the image after retrieving;
-
snapshot the image after retrieving.
Among them, we remain neutral on the operation of clicking to view, because it is impossible to exclude behaviors that users click images to view due to curiosity and so on rather than recognition.
Combining the behaviors generating “explicit tags” mentioned in the previous section, the relationship between tags generated by users’ behaviors (“explicit tags” and “implicit tags”) and images is divided into three level, related_1, neutral_0.5 non-relevant_0, as follows (Fig. 1).
A user may have more than one annotation for an image, but the above behaviors are not cumulative in relevance degree up to 1, that is, as long as there is a strong annotation behavior (relevance_1), the tag is added to the image by the user.
Tag Clustering Analysis.
Clustering is a common data analysis tool and a basic algorithm for data mining. The essence of clustering analysis is to divide data into several clusters according to the relevance. Therefore, it has high similarity within clusters and big difference between clusters.
Tag clustering can be used to find semantic-related labels in social annotation systems, Begdman et al. The principle tag can be mostly represented by identifying the subject of the cluster. If the clusters constitute the special emotion, the principle tags of them is “emotional elements”.
The semantic relevance of two tags can be obtained by relying on the semantic knowledge databases, such as Wordnet (for English) and CSC (for Chinese), to build a semantic correlation matrix (Fig. 2).
Based on the semantic relevance coefficient in the matrix to build N-dimensional space, the Euclidean distance formula (1) can be used to calculate the spatial distance of two tags. The closer, the more similar tags can be considered.
The shortest two clusters are merged into a large cluster until all small clusters are merged into a large cluster. The whole process can be shown in a form of a tree structure. Any number of semantic groups can be got through hierarchical clustering analysis (Fig. 3).
Factor Analysis of Emotional Cognition.
The main purpose of factor analysis is to reduce dimension by transferring lots of indicators into several comprehensive indicators under little information lost.
As tags increases, redundancy and uniqueness should be considered when performing image matching in the emotional space. Using factor analysis, an orthogonal emotional space can be constructed not only to retain the majority of original indicators meaning, but also to ensure the simplification of the model.
At the same time, the weight of each emotion dimension is allocated according to the contribution rate of each factor, rather than the artificial judgment, which makes the model more objective and reasonable.
Creating a tag-image matrix S = {sim} based on website image-tag database() (Fig. 4).
sim is the score of image Ii on tag Tm, determined by the number of Tm on Ii. Since the number of tags on different images are in a different order of magnitude, it needs to be standardized. For an image Ia and the number of tags N = {n1, n2, …, nm}, its score sai
After factor analysis, factors F = {F1, F2, …, Fn}, that is the “emotional dimensions”, and their variance contribution rate A = {a1, a2, …, an} can be got. The emotion Y can be represented by emotional cognitive factors F, as (2).
In addition, we obtain a factor load coefficients matrix of tags T and factors F. The rotation factor load coefficient matrix B = {bmn} can be obtained by using Varimax to rotate the initial factor load matrix. The rotation method can keep the factors orthogonal to each other, but the variance difference of each factor is maximized, so it is convenient to explain the factor.
Quantization models of each emotional dimension can be obtained.
(3) into (2), we can get an emotional cognitive quantification model of Y
3 Image Emotional Semantic Retrieval System
In the image emotional semantic retrieval systems, it is an important part to find most appropriate images for given tags and find the most appropriate tags for given images. That is, need to find the most appropriate match with each other tag-image pairs.
The system has three main functional modules,
First, recommend the relevant images according to users’ search terms.
Second, recommend the relevant tags for the images that users agree with.
The third, based on users’ feedback on the recommended results, expand the tag- image database.
Given an input tag Ta, the recommended image set IR = {Ir1, Ir2, …, Iri}, and the recommended tag set for image Tr = {Tr1, Tr2, …, Tri}, the flow of the system is divided into the following steps (Fig. 5).
3.1 Emotional Semantic Modeling
Candidate Tag Selection.
Given an initial tag Ta, the tag set T = {T1, T2, …, Ti} where all tags associated with it are included is collected based on the coherence principle. The more images any two tags are annotated on by users at the same time, the more there are cognitive links with them. In order to avoid the tag noise, it is needed to set a threshold, usually an empirical value. There are experimental evidences found that the value set 10 can make the best performance.
Through synonyms to merger these coherence tags, get the candidate tag set Tc = {Tc1, Tc2, …, Tci}
Aggregation of Candidate Tags.
Using the external semantic knowledge database Wordnet (for English) or CSC (for Chinese), establish a semantic association matrix according to the semantics relevance of Tc, shown as below. The correlation value is 0–1, the higher the value is, the higher the relevance is.
Then, cluster the candidate tags by the clustering algorithm to generate several original semantic clusters. A representative label is selected as a representative label to represent each cluster of the class, which forms emotional elements set Tp = {Tp1, Tp2, …, Tpm}.
Emotional Semantic Modeling.
Based on the website image-tag database, making factor analysis on Tp, to get the emotional dimension set F = {F1, F2, …, Fn} and its variance contribution rate A = {a1, a2, …, an}, and then Ta emotional cognitive model can be expressed as follow.
According to the rotation factor load factor matrix B = {bmn}, each factor F quantization model can be expressed as follow.
Combining the above two expressions to obtain the emotional semantic model of Ta with tag elements Tp,
3.2 Images Ranking and Tags Selection
Ranking of Images to Recommend.
The value of each image on Tp is smi. The emotional value of each image on tag Ta can be calculated by using the formula (8). Recommended image are sorted by the emotional value from high to low.
Selection of Tags to Recommend.
Each image has the highest score tag Tp1 that is included in the emotional dimension F, where there are other tags Tp {Tp2, Tp3, …, Tpi}. Tags are recommended according to the weight in the formula (7).
Feedback Collection.
According to the user’s browsing behavior, collect user’s various types of annotation activities, which can generate explicit tags or implicit tags to enrich and expand the image-tag library data.
4 Simulation Experiment
4.1 Experiment Setting
“Daqi” was set as the special emotional term, and 25 testers (12 males and 13 females) were invited to participate in the experiment. 150 images from appliances, furniture, transportation, construction, utensils, jewelry and other fields were selected as exciter. Testers were asked to grade the correlation between images and some terms related to “Daqi”, non-relevance 0, neutral 0.5, relevance 1. Using the experimental data to get the “Daqi” emotional semantic quantification model, some validation tests were made.
4.2 Experiment Process
Construct Emotional Quantification Model.
Through the literature, online comments, and website etc., more than 140 adjectives appearing at the same time as “Daqi” were collected. Then, merge these terms into 45 terms through synonyms. Researchers, according to their own professional experience and cognition. A semantic correlation matrix (Fig. 6) of these 45 emotional terms were built on researchers’ professional experience.
Through the clustering analysis, ward method, 16 related emotional terms, emotional elements, on behalf of each cluster respectively were obtained. They were Quality, Generous, Uniform, Smooth, Solemnly, Full, Rounded, Elegant, Simple, Artless, Pretty, Delicate, Angular, Hard, Huge, Uninhibited.
25 testers were asked to score the degree of correlation between 150 stimuli images and the 16 emotional terms, respectively. The result of factor analysis (Fig. 7) on the experimental data is as follow. KMO is 0.789 and the data is suitable for factor analysis.
To ensure a reasonable explanation, we choose the factor combination of which contribute is up to 83.223%. There are five factors in it. And the degree of extraction of each emotional term can reach more than 75%. Get the expression (9) of the five factors i, as follows (Fig. 8).
According to the composition of the score coefficient matrix, “Daqi” emotional cognition model can be got, as follows.
Extract Emotional Dimensions.
According to the rotation component matrix, five emotional dimensions can be made sure, and each emotional dimension inside the emotional composition is as follows (Fig. 9).
-
F1 (Quality, Generous, Delicate, Elegant, Smooth, Uniform)
-
F2 (Angular, Rounded, Hard, Full)
-
F3 (Artless, Simple)
-
F4 (Uninhibited, Huge, Pretty)
-
F5 (Solemnly).
And the correlation matrix between the various emotional components is as follows (Fig. 10).
4.3 Experimental Verification
Verification of Model Calculated Value.
To ensure similar cognition comparisons, “appliances” are served as test. 10 appliance images (Fig. 11) were shown to be scored on the relevance with “Daqi” and we collected the scores data from 100 volunteers (48 male and 52 female). Take the average to rank the image and compare with the theoretical values of model (10) (Fig. 12).
There are two ambiguities in the forecasting trend (No. 7 and No. 2), and the coincidence degree is as high as 80%. The model established can basically predict the emotional sensitivity of images.
Verification of Tag Recommended.
Based on the emotional dimensions and the relevance between the terms, recommend the relevant label, some tags were selected to recommended, the maximum number 7, and asked 25 testers (12 male and 13 female) to choose the related ones from the recommended ones and calculated the use rate, the proportion of selected and provided tags.
Taking image No. 10 as an example, the highest-scored tags are Generous, Simple, and Rounded, in F1, F2 and F3 dimension, respectively. Combining the relevance matrix, as follows,
-
Generous → F1 → Quality0.815, Smooth0.701, Uniform0.652, Elegant0.612, Delicate0.524
-
Rounded → F2 → Full0.512
-
Simple → F3 → Artless0.668
The recommended tags are as follow,
Generous, Rounded, Simple, Quality, Smooth, Artless, Uniform.
In this way, the recommended tags for the above 10 images and the average usage rate are as follows (Fig. 13).
The average the adoption rate of the recommended tags is 89.6%. It is reasonable to recommend tags based on the emotional dimensions and emotional terms’ relevance.
5 Conclusion
This paper initially envisages an image emotional semantic retrieval mechanism based on cognitive quantification model. Its core idea is to use semantic cognitive relevance of tags to divide some specific emotion into other relevant emotional dimensions and construct the emotional semantic cognition model.
At the same time, based on behavior psychology, tag generation channels are expanded by adding the users’ retrieval behaviors which means “recognition”, which provides more data for the modeling and make the model more representative.
As images need a lot of exposure to accumulate data to get a more accurate model, the idea of emotional semantic modeling is limited for cold-start images.
It is foreseeable that the theory of this research can be applied to other social digital resources, like music or video.
References
Ames, M., Naaman, M.: Why we tag: motivations for annotation in mobile and online media. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM (2007)
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)
Cambria, E., Hussain, A., Havasi, C., Eckl, C.: Sentic computing: exploitation of common sense for the development of emotion-sensitive systems. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds.) Development of Multimodal Interfaces: Active Listening and Synchrony. LNCS, vol. 5967, pp. 148–156. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12397-9_12
Fukumoto, T.: An analysis of image retrieval behavior for metadata type image database. Inf. Process. Manag. 42(3), 723–728 (2006)
Hanbury, A.: A survey of methods for image annotation. J. Visual Lang. Comput. 19(5), 617–627 (2008)
Li, X., et al.: Low-rank image tag completion with dual reconstruction structure preserved. Neurocomputing 173, 425–433 (2016)
Liu, N., et al.: Multimodal recognition of visual concepts using histograms of textual concepts and selective weighted late fusion scheme. Comput. Vis. Image Underst. 117(5), 493–512 (2013)
Liu, Y., et al.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40(1), 262–282 (2007)
Lu, Y., et al.: A unified framework for semantics and feature based relevance feedback in image retrieval systems. In: Proceedings of the Eighth ACM International Conference on Multimedia. ACM (2000)
Matusiak, K.K.: Information seeking behavior in digital image collections: a cognitive approach. J. Acad. Librariansh. 32(5), 479–488 (2006)
Müller, H., et al.: Learning from user behavior in image retrieval: application of market basket analysis. Int. J. Comput. Vis. 56(1–2), 65–77 (2004)
Sánchez-Rada, J.F., Iglesias, C.A.: Onyx: a linked data approach to emotion representation. Inf. Process. Manag. 52(1), 99–114 (2016)
Scherer, K.R.: What are emotions? And how can they be measured? Soc. Sci. Inf. 44(4), 695–729 (2005)
Schmidt, S., Stock, W.G.: Collective indexing of emotions in images. A study in emotional information retrieval. J. Am. Soc. Inf. Sci. Technol. 60(5), 863–876 (2009)
Schröder, M., Pirker, H., Lamolle, M., Burkhardt, F., Peter, C., Zovato, E.: Representing emotions and related states in technological systems. In: Cowie, R., Pelachaud, C., Petta, P. (eds.) Emotion-Oriented Systems, Cognitive Technologies, pp. 369–387. Springer, Heidelberg (2011)
Sun, A., et al.: Tag-based social image retrieval: an empirical evaluation. J. Am. Soc. Inf. Sci. Technol. 62(12), 2364–2381 (2011)
Wang, W., He, Q.: A survey on emotional semantic image retrieval. In: 15th IEEE International Conference on Image Processing, ICIP 2008. IEEE (2008)
Wei-ning, W., et al.: Image retrieval by emotional semantics: a study of emotional space and feature extraction. In: IEEE International Conference on Systems, Man and Cybernetics, SMC 2006. IEEE (2006)
Wu, L., et al.: Tag completion for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35(3), 716–727 (2013)
Zha, Z.-J., et al.: Interactive social group recommendation for Flickr photos. Neurocomputing 105, 30–37 (2013)
Zhang, H., Augilius, E., Honkela, T., Laaksonen, J., Gamper, H., Alene, H.: Analyzing emotional semantics of abstract art using low-level image features. In: Gama, J., Bradley, E., Hollmén, J. (eds.) IDA 2011. LNCS, vol. 7014, pp. 413–423. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24800-9_38
Zimmermann, P., et al.: Affective computing – a rationale for measuring mood with mouse and keyboard. Int. J. Occup. Saf. Ergon. 9(4), 539–551 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Liang, T., Zhang, L., Xie, M. (2017). Research on Image Emotional Semantic Retrieval Mechanism Based on Cognitive Quantification Model. In: Marcus, A., Wang, W. (eds) Design, User Experience, and Usability: Understanding Users and Contexts. DUXU 2017. Lecture Notes in Computer Science(), vol 10290. Springer, Cham. https://doi.org/10.1007/978-3-319-58640-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-58640-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58639-7
Online ISBN: 978-3-319-58640-3
eBook Packages: Computer ScienceComputer Science (R0)