Abstract
Goal-driven top-down mechanism plays important role in the case of object detection and recognition. In this paper, we propose a top-down computational model for goal-driven saliency detection based on the coding-based classification framework. It consists of four successive steps: feature extraction, descriptor coding, contextual pooling and saliency prediction. Particularly, we investigate the effect of spatial context information for saliency detection, and propose a block-wise spatial pooling operation to take advantage of contextual cues in multiple neighborhood scales and orientations. The experimental results on three datasets demonstrate that our method can effectively exploit contextual information and achieves the state-of-the-art performance on top-down saliency detection task.













Similar content being viewed by others
Notes
We use the code of [32] for calculating the AP value.
We adopt the implementation by LibLinear code package [34].
We obtain the saliency maps for JCDL based on the pre-trained models provided by the authors of [20], which can be download from http://faculty.ucmerced.edu/mhyang/pubs.html. For the top-down saliency models in [16] and [13], we produce corresponding saliency maps based on the code from the websites of their authors respectively: http://ilab.usc.edu/toolkit/downloads.shtml and http://people.csail.mit.edu/tjudd/WherePeopleLook/index.html.
References
Gao, D., Han, S., Vasconcelos, N. (2009). Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 989–1005.
Lee, Y.B., & Lee, S. (2011). Robust face detection based on knowledge-directed specification of botton-up saliency. ETRI Journal, 33(4), 600–610.
Achanta, R., & Susstrunk, S. (2009). Saliency detection for content-aware image resizing. In Proceedings of IEEE international conference on image processing (pp. 1005–1008).
Guo, C., & Zhang, L. (2010). A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Transactions on Image Processing, 19(1), 185–198.
Ma, Q., & Zhang, L. (2008). Image quality assessment with visual attention. In Proceedings of international conference on pattern recognition.
Loupias, E., Sebe, N., Bres, S., Jolion, J.M. (2000). Wavelet-based salient points for image retrieval. In Proceedings of IEEE international conference on image processing (pp. 518–521).
Itti, L., Koch, C., Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 1254–1259.
Harel, J., Koch, C., Pernoa, P. (2006). Graph-based visual saliency. In Proceedings of advances in neural information processing systems (pp. 545–552).
Bruce, N., & Tsotsos, J. (2006). Saliency based on information maximization. In Proceedings of advances in neural information processing systems (pp. 155–162).
Hou, X., & Zhang, L. (2007). Saliency detection: a spectral residual approach. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 2280–2287).
Hou, X., & Zhang, L. (2008). Dynamic visual attention: Searching for coding length increments. In Proceedings of advances in neural information processing systems (pp. 681–688).
Achanta, R., Hemami, S., Estrada, F., Susstrunk, S. (2009). Frequency-tuned salient region detection. In Proceedings of IEEE international conference on computer vision and pattern recognition (Vol. 1–4, pp. 1597–1604).
Judd, T., Ehinger, K., Durand, F., Torralba, A. (2009). Learning to predict where humans look. In Proceedings of IEEE international conference on computer vision (pp. 2106–2113).
Cheng, M.M., Zhang, G.X., Mitra, N.J., Huang, X.L., Hu, S.M. (2011). Global contrast based salient region detection. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 409–416).
Zhou, Q., Li, N.Y., Yang, Y., Chen, P., Liu, W.Y. (2012). Corner-surround contrast for saliency detection. In Proceedings of IEEE international conference on patten recognition (pp. 1–4).
Navalpakkam, V., & Itti, L. (2006). An integrated model of top-down and bottom-up attention for optimizing detection speed. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 2049–2056).
Frintrop, S., Backer, G., Rome, E. (2005). Goal-directed search with a top-down modulated computational attention system. In Proceedings of pattern recognition, LNCS (Vol. 3663, pp. 117–124).
Oliva, A., Torralba, A., Castelhano, M.S., Henderson, J.M. (2003). Top-down control of visual attention in object detection. In Proceedings of IEEE international conference on image processing (pp. 253–256).
Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., Shum, H.-Y. (2011). Learning to detect a salient object. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(2), 353–367.
Yang, J., & Yang, M.-H. (2012). Top-down visual saliency via joint CRF and dictionary learning. In Proceedings of IEEE international conference on computer vision and pattern recognition.
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y. (2010). Locality-constrained linear coding for image classification. In Proceedings of IEEE international conference on computer vision and pattern recognition (pp. 3360–3367).
Yang, J., Yu, K., Gong, Y., Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of IEEE international conference on computer vision and pattern recognition.
Zhu, J., Zou, W., Yang, X., Zhang, R., Zhou, Q., Zhang, W. (2012). Image classification by hierarchical spatial pooling with partial least squares analysis. In Proceedings of British machine vision conference.
Lowe, D.G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60, 91–110.
Yu, K., Zhang, T., Gong, Y. (2009). Nonlinear learning using local coordinate coding. In Proceedings of advances in neural information processing systems (pp. 2223–2231).
Torralba, A., Oliva, A., Castelhano, M.S., Henderson, J.M. (2006). Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychology Review, 113, 766–786.
Hong, R., Wang, M., Xu, M., Yan, S., Chua, T.-S. (2010). Dynamic captioning: video accessibility enhancement for hearing impairment. In Proceedings of ACM multimedia (pp. 421–430).
Wang, M., Hong, R., Yuan, X.-T., Yan, S., Chua, T.-S. (2012). Movie2Comics: towards a lively video content presentation. IEEE Transactions on Multimedia, 14, 858–870.
Carbonetto, P., Dork, G., Schmid, C., Kück, H., de Freitas, N. (2008). Learning to recognize objects with little supervision. International Journal of Computer Vision, 77, 219–237.
Russell, B., Torralba, A., Murphy, K., Freeman, W.T. (2007). LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision, 77, 157–173.
Opelt, A., Pinz, A., Fussenegger, M., Auer, P. (2006). Generic object recognition with boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(3), 416–431.
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A (2010). The Pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
Shen, X., & Wu, Y. (2012). A unified approach to salient object detection via low rank matrix recovery. In Proceedings of IEEE international conference on computer vision and pattern recognition.
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wanvg, X.-R., Lin, C.-J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.
Zhai, Y., & Shah, M. (2006). Visual attention detection in video sequences using spatialtemporal cues. In Proceedings of the 14th annual ACM international conference on multimedia (pp. 815–824).
Qiu, Y., Zhu, J., Zhang, R., Huang, J. (2012). Top-Down saliency by multi-scale contextual pooling. In Proceedings of the 13th Pacific-Rim conference on multimedia (pp. 294–305).
Acknowledgments
The authors thank Quan Zhou for valuable comments. We also thank Mingming Cheng, Ming-Hsuan Yang, Laurent Itti and Tilke Judd for providing the codes of their methods on experimental comparison.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is funded by the Project NSFC (61071155, 61201446), 973 National Program (2010CB731401, 2010CB731406) and STCSM (12DZ2272600).
Rights and permissions
About this article
Cite this article
Zhu, J., Qiu, Y., Zhang, R. et al. Top-Down Saliency Detection via Contextual Pooling. J Sign Process Syst 74, 33–46 (2014). https://doi.org/10.1007/s11265-013-0768-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-013-0768-9