KP-Rank: a semantic-based unsupervised approach for keyphrase extraction from text data

Muhammad Aman^1,2,
Said Jadid Abdulkadir ORCID: orcid.org/0000-0003-0038-3702³,
Izzatdin Abdul Aziz³,
Hitham Alhussian³ &
…
Israr Ullah⁴

636 Accesses
8 Citations
Explore all metrics

Abstract

Automatic key concept identification from text is the main challenging task in information extraction, information retrieval, digital libraries, ontology learning, and text analysis. The main difficulty lies in the issues with the text data itself, such as noise in text, diversity, scale of data, context dependency and word sense ambiguity. To cope with this challenge, numerous supervised and unsupervised approaches have been devised. The existing topical clustering-based approaches for keyphrase extraction are domain dependent and overlooks semantic similarity between candidate features while extracting the topical phrases. In this paper, a semantic based unsupervised approach (KP-Rank) is proposed for keyphrase extraction. In the proposed approach, we exploited Latent Semantic Analysis (LSA) and clustering techniques and a novel frequency-based algorithm for candidate ranking is introduced which considers locality-based sentence, paragraph and section frequencies. To evaluate the performance of the proposed method, three benchmark datasets (i.e. Inspec, 500N-KPCrowed and SemEval-2010) from different domains are used. The experimental results show that overall, the KP-Rank achieved significant improvements over the existing approaches on the selected performance measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting extensible background knowledge for clustering-based automatic keyphrase extraction

Article Open access 16 August 2018

SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation

TeKET: a Tree-Based Unsupervised Keyphrase Extraction Technique

Article Open access 05 March 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Adar E, Datta S (2015) Building a scientific concept hierarchy database (schbase). Ann Arbor 1001:48104
Google Scholar
Aman M, bin Md Said A, Jadid Abdul Kadir S, Ullah I (2018) Key concept identification: a comprehensive analysis of frequency and topical graph-based approaches. Information 9(5):128
Article Google Scholar
Aman M, bin Md Said A, Kadir SJA, Ullah I (2018) Key concept identification: A sentence parse tree-based technique for candidate feature extraction from unstructured texts, IEEE Access
Barker K, Cornacchia N (2000) Using noun phrase heads to extract document keyphrases. Adv Artif Intell, 40–52
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
MATH Google Scholar
Boudin F (2016) pke: an open source python-based keyphrase extraction toolkit. In: COLING (Demos), pp 69–73
Bougouin A, Boudin F, Daille B (2013) Topicrank: Graph-based topic ranking for keyphrase extraction. In: International joint conference on natural language processing (IJCNLP), pp 543–551
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1-7):107–117
Article Google Scholar
Chandu K, Naik A, Chandrasekar A, Yang Z, Gupta N, Nyberg E (2017) Tackling biomedical text summarization:, Oaqa at bioasq 5b. In: BioNLP 2017, pp 58–66
Danesh S, Sumner T, Martin JH (2015) Sgrank: Combining statistical and graphical methods to improve the state of the art in unsupervised keyphrase extraction. In: * SEM NAACL-HLT, pp 117–126
Danilevsky M, Wang C, Desai N, Ren X, Guo J, Han J (2014) Automatic construction and ranking of topical keyphrases on collections of short documents. In: Proceedings of the 2014 SIAM international conference on data mining, SIAM, 398–406
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inform Sci 41(6):391
Article Google Scholar
El-Beltagy SR, Rafea A (2009) Kp-miner: a keyphrase extraction system for english and arabic documents. Inf Syst 34(1):132–144
Article Google Scholar
Florescu C, Caragea C (2017) Positionrank: An unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual meeting of the association for computational linguistics (Volume 1: Long Papers), vol 1, pp 1105–1115
Frank E, Paynter GW, Witten IH, Gutwin C, Nevill-Manning CG (1999) Domain-specific keyphrase extraction. In: 16th International joint conference on artificial intelligence (IJCAI 99), vol. 2. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 668–673
Geiss J (2011) Latent semantic sentence clustering for multi-document summarization. University of Cambridge, Computer Laboratory, Tech. Rep.
Gollapalli SD, Caragea C (2014) Extracting keyphrases from research papers using citation networks. In: AAAI, pp 1629–1635
Gollapalli SD, Li X-L, Yang P (2017) Incorporating expert knowledge into keyphrase extraction. In: AAAI, pp 3180–3187
Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. J R Stat Soc Ser C Appl Stat 28(1):100–108
MATH Google Scholar
Hasan KS, Ng V (2010) Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd international conference on computational linguistics: posters. Association for Computational Linguistics, pp 365–373
Haveliwala TH (2003) Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE Trans Knowl Data Eng 15(4):784–796
Article Google Scholar
Hulth A (2003) Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 conference on empirical methods in natural language processing. Association for Computational Linguistics pp 216–223
Hulth A, Megyesi BB (2006) A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the association for computational linguistics. Association for Computational Linguistics, pp 537–544
Kang Y-B, Haghighi PD, Burstein F (2014) Cfinder: an intelligent key concept finder from text for ontology development. Expert Syst Appl 41 (9):4494–4504
Article Google Scholar
Kashyap A, Han L, Yus R, Sleeman J, Satyapanich T, Gandhi S, Finin T (2016) Robust semantic text similarity using lsa, machine learning, and linguistic resources. Lang Resour Eval 50(1):125–161
Article Google Scholar
Kim SN, Medelyan O, Kan M. -Y., Baldwin T (2013) Automatic keyphrase extraction from scientific articles. Lang Resour Eval 47(3):723–742
Article Google Scholar
Klein D, Manning CD (2003) Accurate unlexicalized parsing. In: Proceedings of the 41st annual meeting of the association for computational linguistics
Kwon H, Kim J, Park Y (2017) Applying lsa text mining technique in envisioning social impacts of emerging technologies: the case of drone technology. Technovation 60:15–28
Article Google Scholar
Lahiri S, Choudhury SR, Caragea C (2014) Keyword and keyphrase extraction using centrality measures on collocation networks.arXiv:1401:6571
Le TTN, Le Nguyen M, Shimazu A (2016) Unsupervised keyphrase extraction: Introducing new kinds of words to keyphrases. In: Australasian joint conference on artificial intelligence, Springer, pp 665–671
Lewis DD (1995) Evaluating and optimizing autonomous text classification systems. In: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 246–254
Li L, Simek O, Lai A, Daggett M, Dagli CK, Jones C (2018) Detection and characterization of human trafficking networks using unsupervised scalable text template matching. In: 2018 IEEE international conference on big data (Big Data). IEEE, pp 3111–3120
Liu Z, Huang W, Zheng Y, Sun M (2010) Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics pp 366–376
Liu Z, Li P, Zheng Y, Sun M (2009) Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 conference on empirical methods in natural language processing: Volume 1-Volume 1, Association for Computational Linguistics, pp 257–266
Liu Z, Liang C, Sun M (2012) Topical word trigger model for keyphrase extraction. In: Proceedings of COLING 2012, pp 1715–1730
Liu F, Pennell D, Liu F, Liu Y (2009) Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics. Association for Computational Linguistics, pp 620–628
Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the penn treebank. Comput Linguist 19(2):313–330
Google Scholar
Martinez-Romo J, Araujo L, Duque Fernandez A (2016) Semgraph: Extracting keyphrases following a novel semantic graph-based approach. J Assoc Inf Sci Technol 67(1):71–82
Article Google Scholar
Marujo L, Gershman A, Carbonell J, Frederking R, Neto JP (2012) Supervised topical key phrase extraction of news stories using crowdsourcing, light filtering and co-reference normalization. In: Proceedings of the eighth international conference on language resources and evaluation (LREC-2012), pp 399–403
Matsuo Y, Ishizuka M (2004) Keyword extraction from a single document using word co-occurrence statistical information. Int J Artif Intell Tools 13(01):157–169
Article Google Scholar
McInnes L, Healy J, Astels S (2017) Hdbscan: Hierarchical density based clustering. J Open Source Softw 2(11):205
Article Google Scholar
Medelyan O, Frank E, Witten IH (2009) Human-competitive tagging using automatic keyphrase extraction. In: Proceedings of the 2009 conference on empirical methods in natural language processing: Volume 3-Volume 3. Association for Computational Linguistics, pp 1318–1327
Mendoza M, Ormeno P, Valle C (2018) Boosting text clustering using topic selection
Merchant K, Pande Y (2018) Nlp based latent semantic analysis for legal text summarization. In: 2018 International conference on advances in computing, communications and informatics (ICACCIx), IEEE, pp 1803–1807
Mihalcea R, Tarau P (2004) Textrank: Bringing order into text. In: EMNLP, vol. 4, pp 404–411
Nam KS, Olena M, Min-Yen K, Timothy B (2010) Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th International workshop on semantic evaluation. Association for Computational Linguistics, pp 21–26
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Rafiei-Asl J, Nickabadi A (2017) Tsake: a topical and structural automatic keyphrase extractor. Appl Soft Comput 58:620–630
Article Google Scholar
Rijsbergen CJV (1979) Information retrieval. 2nd ed. Newton, MA USA: Butterworth-Heinemann
Saidul HK, Vincent N (2014) Automatic keyphrase extraction: A survey of the state of the art. In: ACL (1)pp 1262–1273
Shen Y, Zhang Q, Zhang J, Huang J, Lu Y, Lei K (2018) Improving medical short text classification with semantic expansion using word-cluster embedding. In: International conference on information science and applications, Springer, vol 401–411
SÜZEK TÖ (2017) Using latent semantic analysis for automated keyword extraction from large document corpora. Turk J Electr Eng Comput Sci 25(3):1784–1794
Article Google Scholar
Teneva N, Cheng W (2017) Salience rank:, efficient keyphrase extraction with topic modeling. In: Proceedings of the 55th Annual meeting of the association for computational linguistics (Volume 2: Short Papers), vol 2, pp 530–535
Tomokiyo T, Hurst M (2003) A language model approach to keyphrase extraction. In: Proceedings of the ACL 2003 workshop on multiword expressions: analysis, acquisition and treatment-Volume 18, Association for Computational Linguistics, pp 33–40
Turney P (1997) Extraction of keyphrases from text: Evaluation of four algorithms. national research council canada. Institute for Information Technology
Turney PD (2000) Learning algorithms for keyphrase extraction. Inform Retr 2(4):303–336
Article Google Scholar
Turney PD (2003) Coherent keyphrase extraction via web mining. In: Proceedings of the 18th international joint conference on artificial intelligence, pp 434–439
Turpin A, Scholer F (2006) User performance versus precision measures for simple search tasks. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 11–18
Wan X, Xiao J (2008) Single document keyphrase extraction using neighborhood knowledge. In: AAAI, vol 8, pp 855–860
Wan X, Yang J, Xiao J (2007) Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: ACL, vol 7, pp 552–559
Wang R, Liu W, McDonald C (2014) Corpus-independent generic keyphrase extraction using word embedding vectors. In: Software engineering research conference, vol 39
Ward JH Jr (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244
Article MathSciNet Google Scholar
Xiaojun W, Jianguo X (2008) Collabrank: towards a collaborative approach to single-document keyphrase extraction. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, Association for Computational Linguistics, pp 969–976
Zha H (2002) Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 113–120
Zhang Y, Milios E, Zincir-Heywood N (2007) A comparative study on key phrase extraction methods in automatic web site summarization. J Digit Inf Manag 5(5):323
Google Scholar
Zhang Q, Wang Y, Gong Y, Huang X (2016) Keyphrase extraction using deep recurrent neural networks on twitter. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 836–845

Download references

Acknowledgments

This research was fully funded and supported by Universiti Teknologi PETRONAS, under the Yayasan Universiti Teknologi PETRONAS (YUTP), Cost Centre (015LC0-119).

Author information

Authors and Affiliations

Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar, Malaysia
Muhammad Aman
Technology and Development Directorate, National Database and Registration Authority (NADRA), Islamabad, Pakistan
Muhammad Aman
Center for Research in Data Science (CeRDaS), Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar, Malaysia
Said Jadid Abdulkadir, Izzatdin Abdul Aziz & Hitham Alhussian
Department of Computer Science, Virtual University of Pakistan, Lahore, Pakistan
Israr Ullah

Authors

Muhammad Aman
View author publications
You can also search for this author in PubMed Google Scholar
Said Jadid Abdulkadir
View author publications
You can also search for this author in PubMed Google Scholar
Izzatdin Abdul Aziz
View author publications
You can also search for this author in PubMed Google Scholar
Hitham Alhussian
View author publications
You can also search for this author in PubMed Google Scholar
Israr Ullah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Said Jadid Abdulkadir.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Description of part-of-speech (POS) tags

Table 15 Description of part-of-speech (POS) tags [37]

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aman, M., Abdulkadir, S.J., Aziz, I.A. et al. KP-Rank: a semantic-based unsupervised approach for keyphrase extraction from text data. Multimed Tools Appl 80, 12469–12506 (2021). https://doi.org/10.1007/s11042-020-10215-x

Download citation

Received: 23 February 2020
Revised: 16 October 2020
Accepted: 09 December 2020
Published: 11 January 2021
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11042-020-10215-x

KP-Rank: a semantic-based unsupervised approach for keyphrase extraction from text data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting extensible background knowledge for clustering-based automatic keyphrase extraction

SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation

TeKET: a Tree-Based Unsupervised Keyphrase Extraction Technique

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix: Description of part-of-speech (POS) tags

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

KP-Rank: a semantic-based unsupervised approach for keyphrase extraction from text data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting extensible background knowledge for clustering-based automatic keyphrase extraction

SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation

TeKET: a Tree-Based Unsupervised Keyphrase Extraction Technique

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix: Description of part-of-speech (POS) tags

Appendix: Description of part-of-speech (POS) tags

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation