MapIntel: Enhancing Competitive Intelligence Acquisition Through Embeddings and Visual Analytics

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13566))

Included in the following conference series:

EPIA Conference on Artificial Intelligence

1496 Accesses

Abstract

Competitive Intelligence allows an organization to keep up with market trends and foresee business opportunities. This practice is mainly performed by analysts scanning for any piece of valuable information in a myriad of dispersed and unstructured sources. Here we present MapIntel, a system for acquiring intelligence from vast collections of text data by representing each document as a multidimensional vector that captures its own semantics. The system is designed to handle complex Natural Language queries and visual exploration of the corpus, potentially aiding overburdened analysts in finding meaningful insights to help decision-making. The system searching module uses a retriever and re-ranker engine that first finds the closest neighbors to the query embedding and then sifts the results through a cross-encoder model that identifies the most relevant documents. The browsing module also leverages the embeddings by projecting them onto two dimensions while preserving the original landscape, resulting in a map where semantically related documents form topical clusters which we capture using topic modeling. This map aims at promoting a fast overview of the corpus while allowing a more detailed exploration and interactive information encountering process. In this work, we evaluate the system and its components on the 20 newsgroups dataset and demonstrate the superiority of Transformer-based components.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Competitive Intelligence in the Service Sector: A Data Visualization Approach

DeepBrowse: Similarity-Based Browsing Through Large Lists (Extended Abstract)

Semantic Keywords Clustering to Optimize Text Ads Campaigns

Notes

References

Angelov, D.: Top2Vec: distributed representations of topics. arXiv:2008.09470 [cs, stat] (2020)
Bajaj, P., et al.: MS MARCO: a human generated MAchine Reading COmprehension dataset. arXiv:1611.09268 [cs] (2018)
Bianchi, F., Terragni, S., Hovy, D.: Pre-training is a hot topic: contextualized document embeddings improve topic coherence. arXiv:2004.03974 [cs] (2021)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Caillou, P., Renault, J., Fekete, J.D., Letournel, A.C., Sebag, M.: Cartolabe: a web-based scalable visualization of large document collections. IEEE Comput. Graphics Appl. 41(2), 76–88 (2021). https://doi.org/10.1109/MCG.2020.3033401
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 [cs] (2019)
Dey, L., Haque, S.M., Khurdiya, A., Shroff, G.: Acquiring competitive intelligence from social media. In: Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data, MOCR_AND 2011, pp. 1–9. Association for Computing Machinery, New York (2011). https://doi.org/10.1145/2034617.2034621
Erdelez, S., Makri, S.: Information encountering re-encountered: a conceptual re-examination of serendipity in the context of information acquisition. J. Documentation 76(3), 731–751 (2020). https://doi.org/10.1108/JD-08-2019-0151
Article Google Scholar
Esteva, A., et al.: CO-search: COVID-19 information retrieval with semantic search, question answering, and abstractive summarization. arXiv:2006.09595 [cs] (2020)
Grootendorst, M.: BERTopic: leveraging BERT and c-TF-IDF to create easily interpretable topics (2020). https://doi.org/10.5281/zenodo.4381785
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)
Google Scholar
Humeau, S., Shuster, K., Lachaux, M.A., Weston, J.: Poly-encoders: transformer architectures and pre-training strategies for fast and accurate multi-sentence scoring (2019). https://doi.org/10.48550/ARXIV.1905.01969
Kratzwald, B., Eigenmann, A., Feuerriegel, S.: RankQA: neural question answering with answer re-ranking. arXiv:1906.03008 [cs] (2019)
Lafia, S., Kuhn, W., Caylor, K., Hemphill, L.: Mapping research topics at multiple levels of detail. Patterns 2(3), 100210 (2021). https://doi.org/10.1016/j.patter.2021.100210
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196. PMLR (2014)
Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999). https://doi.org/10.1038/44565
Article MATH Google Scholar
Madureira, L., Popovič, A., Castelli, M.: Competitive intelligence: a unified view and modular definition. Technol. Forecast. Soc. Chang. 173, 121086 (2021). https://doi.org/10.1016/j.techfore.2021.121086
Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. arXiv:1603.09320 [cs] (2018)
Marin, J., Poulter, A.: Dissemination of competitive intelligence. J. Inf. Sci. 30(2), 165–180 (2004). https://doi.org/10.1177/0165551504042806
Article Google Scholar
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv:1802.03426 [cs, stat] (2020)
Nogueira, R., Cho, K.: Passage Re-ranking with BERT. arXiv:1901.04085 [cs] (2020)
Ozaki, Y., Tanigaki, Y., Watanabe, S., Onishi, M.: Multiobjective tree-structured parzen estimator for computationally expensive optimization problems. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference, GECCO 2020, pp. 533–541. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3377930.3389817
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. arXiv:1908.10084 [cs] (2019)
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, WSDM 2015, pp. 399–408. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2684822.2685324
Schütze, H., Manning, C.D., Raghavan, P.: Introduction to Information Retrieval, vol. 39. Cambridge University Press, Cambridge (2008)
MATH Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)
Google Scholar
Vaswani, A., et al.: Attention is all you need. arXiv:1706.03762 [cs] (2017)

Download references

Acknowledgment

This work was supported by the Fundação para a Ciência e Tecnologia of Ministério da Ciência e Tecnologia e Ensino Superior (research grant under the DSAIPA/DS/0116/2019 project).

Author information

Authors and Affiliations

NOVA IMS, NOVA University of Lisbon, Campus de Campolide, 1070-312, Lisbon, Portugal
David Silva & Fernando Bacao

Authors

David Silva
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Bacao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Silva .

Editor information

Editors and Affiliations

ISEP/GECAD, Polytechnic Institute of Porto, Porto, Portugal
Goreti Marreiros
IST/INESC-ID, University of Lisbon, Lisbon, Portugal
Bruno Martins
IST/INESC-ID, University of Lisbon, Porto Salvo, Portugal
Ana Paiva
CISUC, University of Coimbra, Coimbra, Portugal
Bernardete Ribeiro
IST/INESC-ID, University of Lisbon, Porto Salvo, Portugal
Alberto Sardinha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Silva, D., Bacao, F. (2022). MapIntel: Enhancing Competitive Intelligence Acquisition Through Embeddings and Visual Analytics. In: Marreiros, G., Martins, B., Paiva, A., Ribeiro, B., Sardinha, A. (eds) Progress in Artificial Intelligence. EPIA 2022. Lecture Notes in Computer Science(), vol 13566. Springer, Cham. https://doi.org/10.1007/978-3-031-16474-3_49

Download citation

DOI: https://doi.org/10.1007/978-3-031-16474-3_49
Published: 13 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16473-6
Online ISBN: 978-3-031-16474-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MapIntel: Enhancing Competitive Intelligence Acquisition Through Embeddings and Visual Analytics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Competitive Intelligence in the Service Sector: A Data Visualization Approach

DeepBrowse: Similarity-Based Browsing Through Large Lists (Extended Abstract)

Semantic Keywords Clustering to Optimize Text Ads Campaigns

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

MapIntel: Enhancing Competitive Intelligence Acquisition Through Embeddings and Visual Analytics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Competitive Intelligence in the Service Sector: A Data Visualization Approach

DeepBrowse: Similarity-Based Browsing Through Large Lists (Extended Abstract)

Semantic Keywords Clustering to Optimize Text Ads Campaigns

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation