Abstract
In this article, we introduce an explicit count-based strategy to build word space models with syntactic contexts (dependencies). A filtering method is defined to reduce explicit word-context vectors. This traditional strategy is compared with a neural embedding (predictive) model also based on syntactic dependencies. The comparison was performed using the same parsed corpus for both models. Besides, the dependency-based methods are also compared with bag-of-words strategies, both count-based and predictive ones. The results show that our traditional count-based model with syntactic dependencies outperforms other strategies, including dependency-based embeddings, but just for the tasks focused on discovering similarity between words with the same function (i.e. near-synonyms).

Similar content being viewed by others
Notes
We use bow to refer to linear bag-of-word contexts, which must be distinguished from continuous bag-of-words (CBOW). Unlike linear bag-of-words, CBOW uses continuous distributed representation of the context. It is a learning strategy that tries to predict a given word given its context, instead of predicting the context given a word as in the skip-gram model.
The number of target words differs from predictive models due to multiple heuristics and thresholds (hyperparameters) used to generate both predictive and count-based models.
References
Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., & Soroa, A. (2009). A study on similarity and relatedness using distributional and wordnet-based approaches. In Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the Association for Computational Linguistics, NAACL ’09 (pp. 19–27).
Baroni, M., & Lenci, A. (2010). Distributional memory: A general framework for corpus-based semantics. Computational Linguistics, 36(4), 673–721.
Baroni, M., Bernardi, R., & Zamparelli, R. (2014a). Frege in space: A program for compositional distributional semantics. LiLT, 9, 241–346.
Baroni, M., Dinu, G., & Kruszewski, G. (2014b). Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (Volume 1: Long papers) (pp. 238–247). Baltimore, Maryland.
Biemann, C., & Riedl, M. (2013). Text: Now in 2d! A framework for lexical expansion with contextual similarity. Journal of Language Modelling, 1(1), 55–95.
Blacoe, W., & Lapata, M. (2012). A comparison of vector-based representations for semantic composition. In Empirical methods in natural language processing—EMNLP-2012 (pp. 546–556). Jeju Island, Korea.
Bordag, S. (2008) A comparison of co-occurrence and similarity measures as simulations of context. In 9th CICLing (pp. 52–63).
Bullinaria, J . A., & Levy, J. P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods, 39(3), 510–526.
Bullinaria, J. A., & Levy, J. P. (2013). Limiting factors for mapping corpus-based semantic representations to brain activity. PLoS One, 8(3), e57191.
Chen, Z. (2003). Assessing sequence comparison methods with the average precision criterion. Bioinformatics, 19, 2456–2460.
Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In International conference on machine learning. ICML.
Curran, J. R., & Moens, M. (2002). Improvements in automatic thesaurus extraction. In ACL workshop on unsupervised lexical acquisition (pp. 59–66). Philadelphia.
Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.
Fellbaum, C. (1998). A semantic network of english: The mother of all WordNets. Computer and the Humanities, 32, 209–220.
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., et al. (2002). Placing search in context: The concept revisited. ACM Transactions on Information Systems, 20(1), 116–131.
Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R., et al. (2005). New experiments in distributional representations of synonymy. In Proceedings of the ninth conference on computational natural language learning (pp. 25–32).
Gamallo, P. (2008). Comparing window and syntax based strategies for semantic extraction. In PROPOR-2008. Lecture Notes in Computer Science (pp. 41–50). Springer.
Gamallo, P. (2009). Comparing different properties involved in word similarity extraction. In 14th Portuguese conference on artificial intelligence (EPIA’09), LNCS (Vol. 5816, pp. 634–645). Aveiro: Springer.
Gamallo, P. (2015). Dependency parsing with compression rules. In International workshop on parsing technology (IWPT 2015), Bilbao, Spain.
Gamallo, P., & Bordag, S. (2011). Is singular value decomposition useful for word simalirity extraction. Language Resources and Evaluation, 45(2), 95–119.
Gamallo, P., & González, I. (2011). A grammatical formalism based on patterns of part-of-speech tags. International Journal of Corpus Linguistics, 16(1), 45–71.
Gamallo, P., Agustini, A., & Lopes, G. (2005). Clustering syntactic positions with similar semantic requirements. Computational Linguistics, 31(1), 107–146.
Goldberg, Y., & Nivre, J. (2012). A dynamic oracle for arc-eager dependency parsing. In COLING 2012, 24th international conference on computational linguistics proceedings of the conference: Technical papers, 8–15 (pp. 959–976). Mumbai, India.
Grefenstette, G. (1993). Evaluation techniques for automatic semantic extraction: Comparing syntactic and window-based approaches. In Workshop on acquisition of lexical knowledge from text SIGLEX/ACL. Columbus, OH.
Harris, Z. (1985). Distributional structure. In J. Katz (Ed.), The philosophy of linguistics (pp. 26–47). New York: Oxford University Press.
Hofmann, M. J., & Jacobs, A. M. (2014). Interactive activation and competition models and semantic context: From behavioral to brain data. Neuroscience and Biobehavioral Reviews, 46(Part 1), 85–104.
Hofmann, M., Kuchinke, L., Biemann, C., Tamm, S., & Jacobs, A. (2011). Remembering words in context as predicted by an associative read-out model. Frontiers in Psychology, 252(2), 85–104.
Huang, E., Socher, R., & Manning, C. (2012). Improving word representations via global context and multiple word prototypes. In ACL-2012 (pp. 873–882). Jeju Island, Korea.
Landauer, T., & Dumais, S. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquision, induction and representation of knowledge. Psychological Review, 10(2), 211–240.
Lebret, R., & Collobert, R. (2015). Rehabilitation of count-based models for word vector representations. In Gelbukh, A. F. (Ed.), CICLing (1), Springer, Lecture Notes in Computer Science (Vol. 9041, pp. 417–429).
Levy, O., & Goldberg, Y. (2014a). Dependency-based word embeddings. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics, ACL 2014, June 22–27, 2014, Baltimore, MD, USA (pp. 302–308).
Levy, O., & Goldberg, Y. (2014b) Linguistic regularities in sparse and explicit word representations. In Proceedings of the eighteenth conference on Computational Natural Language Learning, CoNLL 2014, Baltimore, Maryland, USA, June 26–27, 2014 (pp. 171–180).
Levy, O., & Goldberg, Y., (2014c) Neural word embedding as implicit matrix factorization. In Advances in neural information processing systems 27: Annual conference on neural information processing systems 2014 (December) (pp. 2177–2185). Montreal, Quebec, Canada.
Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211–225.
Lin, D. (1998). Automatic retrieval and clustering of similar words. In COLING-ACL’98, Montreal.
Lu, C. H., Ong, C. S., Hsub, W. L., & Leeb, H. K. (2011). Using filtered second order co-occurrence matrix to improve the traditional co-occurrence model. In Computer technologies and information sciences, Department of Computer Science and Information Engineering, National Taiwan University, http://www.osti.gov/eprints/topicpages/documents/record/803/2113132.html.
Mikolov, T., Yih, W., & Zweig, G., (2013). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies (pp. 746–751). Atlanta, Georgia.
Padó, S., & Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics, 33(2), 161–199.
Padró, M., Idiart, M., Villavicencio, A., & Ramisch, C. (2014). Nothing like good old frequency: Studying context filters for distributional thesauri. In Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL (pp. 419–424).
Peirsman, Y., Heylen, K., & Speelman, D. (2007). Finding semantically related words in Dutch, co-occurrences versus syntactic contexts. In CoSMO workshop (pp. 9–16). Roskilde, Denmark.
Seretan, V., & Wehrli, E. (2006). Accurate collocation extraction using a multilingual parser. In21st international conference on computational linguistics and the 44th annual meeting of the ACL (pp. 953–960).
Turney, P. (2001). Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In 12th European conference of machine learning (pp. 491–502).
Turney, P. D. (2006). Similarity of semantic relations. Computational Linguistics, 32(3), 379–416.
Zhu, P. (2015). N-Grams based linguistic search engine. International Journal of Computational Linguistics Research, 6(1), 1–7.
Acknowledgments
This research has been partially funded by the Spanish Ministry of Economy and Competitiveness through project FFI2014-51978-C2-1-R. We are very grateful to Omer Levy and Yoav Goldberg for sending us the parsed corpus used to build their embeddings. Moreover, we are also very grateful to the reviewers for their useful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gamallo, P. Comparing explicit and predictive distributional semantic models endowed with syntactic contexts. Lang Resources & Evaluation 51, 727–743 (2017). https://doi.org/10.1007/s10579-016-9357-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-016-9357-4