Abstract
Competitive Intelligence allows an organization to keep up with market trends and foresee business opportunities. This practice is mainly performed by analysts scanning for any piece of valuable information in a myriad of dispersed and unstructured sources. Here we present MapIntel, a system for acquiring intelligence from vast collections of text data by representing each document as a multidimensional vector that captures its own semantics. The system is designed to handle complex Natural Language queries and visual exploration of the corpus, potentially aiding overburdened analysts in finding meaningful insights to help decision-making. The system searching module uses a retriever and re-ranker engine that first finds the closest neighbors to the query embedding and then sifts the results through a cross-encoder model that identifies the most relevant documents. The browsing module also leverages the embeddings by projecting them onto two dimensions while preserving the original landscape, resulting in a map where semantically related documents form topical clusters which we capture using topic modeling. This map aims at promoting a fast overview of the corpus while allowing a more detailed exploration and interactive information encountering process. In this work, we evaluate the system and its components on the 20 newsgroups dataset and demonstrate the superiority of Transformer-based components.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Angelov, D.: Top2Vec: distributed representations of topics. arXiv:2008.09470 [cs, stat] (2020)
Bajaj, P., et al.: MS MARCO: a human generated MAchine Reading COmprehension dataset. arXiv:1611.09268 [cs] (2018)
Bianchi, F., Terragni, S., Hovy, D.: Pre-training is a hot topic: contextualized document embeddings improve topic coherence. arXiv:2004.03974 [cs] (2021)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Caillou, P., Renault, J., Fekete, J.D., Letournel, A.C., Sebag, M.: Cartolabe: a web-based scalable visualization of large document collections. IEEE Comput. Graphics Appl. 41(2), 76–88 (2021). https://doi.org/10.1109/MCG.2020.3033401
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 [cs] (2019)
Dey, L., Haque, S.M., Khurdiya, A., Shroff, G.: Acquiring competitive intelligence from social media. In: Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data, MOCR_AND 2011, pp. 1–9. Association for Computing Machinery, New York (2011). https://doi.org/10.1145/2034617.2034621
Erdelez, S., Makri, S.: Information encountering re-encountered: a conceptual re-examination of serendipity in the context of information acquisition. J. Documentation 76(3), 731–751 (2020). https://doi.org/10.1108/JD-08-2019-0151
Esteva, A., et al.: CO-search: COVID-19 information retrieval with semantic search, question answering, and abstractive summarization. arXiv:2006.09595 [cs] (2020)
Grootendorst, M.: BERTopic: leveraging BERT and c-TF-IDF to create easily interpretable topics (2020). https://doi.org/10.5281/zenodo.4381785
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)
Humeau, S., Shuster, K., Lachaux, M.A., Weston, J.: Poly-encoders: transformer architectures and pre-training strategies for fast and accurate multi-sentence scoring (2019). https://doi.org/10.48550/ARXIV.1905.01969
Kratzwald, B., Eigenmann, A., Feuerriegel, S.: RankQA: neural question answering with answer re-ranking. arXiv:1906.03008 [cs] (2019)
Lafia, S., Kuhn, W., Caylor, K., Hemphill, L.: Mapping research topics at multiple levels of detail. Patterns 2(3), 100210 (2021). https://doi.org/10.1016/j.patter.2021.100210
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196. PMLR (2014)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999). https://doi.org/10.1038/44565
Madureira, L., Popovič, A., Castelli, M.: Competitive intelligence: a unified view and modular definition. Technol. Forecast. Soc. Chang. 173, 121086 (2021). https://doi.org/10.1016/j.techfore.2021.121086
Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. arXiv:1603.09320 [cs] (2018)
Marin, J., Poulter, A.: Dissemination of competitive intelligence. J. Inf. Sci. 30(2), 165–180 (2004). https://doi.org/10.1177/0165551504042806
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426 [cs, stat] (2020)
Nogueira, R., Cho, K.: Passage Re-ranking with BERT. arXiv:1901.04085 [cs] (2020)
Ozaki, Y., Tanigaki, Y., Watanabe, S., Onishi, M.: Multiobjective tree-structured parzen estimator for computationally expensive optimization problems. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference, GECCO 2020, pp. 533–541. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3377930.3389817
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. arXiv:1908.10084 [cs] (2019)
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM 2015, pp. 399–408. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2684822.2685324
Schütze, H., Manning, C.D., Raghavan, P.: Introduction to Information Retrieval, vol. 39. Cambridge University Press, Cambridge (2008)
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)
Vaswani, A., et al.: Attention is all you need. arXiv:1706.03762 [cs] (2017)
Acknowledgment
This work was supported by the Fundação para a Ciência e Tecnologia of Ministério da Ciência e Tecnologia e Ensino Superior (research grant under the DSAIPA/DS/0116/2019 project).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Silva, D., Bacao, F. (2022). MapIntel: Enhancing Competitive Intelligence Acquisition Through Embeddings and Visual Analytics. In: Marreiros, G., Martins, B., Paiva, A., Ribeiro, B., Sardinha, A. (eds) Progress in Artificial Intelligence. EPIA 2022. Lecture Notes in Computer Science(), vol 13566. Springer, Cham. https://doi.org/10.1007/978-3-031-16474-3_49
Download citation
DOI: https://doi.org/10.1007/978-3-031-16474-3_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16473-6
Online ISBN: 978-3-031-16474-3
eBook Packages: Computer ScienceComputer Science (R0)