Keyword extraction
Keyword extraction is tasked with the automatic identification of terms that best describe the subject of a
document.[1][2]
Key phrases, key terms, key segments or just keywords are the terminology which is used for defining the
terms that represent the most relevant information contained in the document. Although the terminology is
different, function is the same: characterization of the topic discussed in a document. The task of keyword
extraction is an important problem in text mining, information extraction, information retrieval and natural
language processing (NLP).[3]
Keyword assignment vs. extraction
Keyword assignment methods can be roughly divided into:
keyword assignment (keywords are chosen from controlled vocabulary or taxonomy) and
keyword extraction (keywords are chosen from words that are explicitly mentioned in
original text).
Methods for automatic keyword extraction can be supervised, semi-supervised, or unsupervised.[4][5]
Unsupervised methods can be further divided into simple statistics, linguistics or graph-based, or ensemble
methods that combine some or most of these methods. [6]
References
1. Beliga, Slobodan; Ana, Meštrović; Martinčić-Ipšić, Sanda. (2015). "An Overview of Graph-
Based Keyword Extraction Methods and Approaches" (http://hrcak.srce.hr/file/207669).
Journal of Information and Organizational Sciences. 39 (1): 1–20.
2. Rada Mihalcea; Paul Tarau (July 2004). TextRank: Bringing Order into Texts (http://web.eec
s.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) (PDF). Proceedings of the
Conference on Empirical Methods in Natural Language Processing (EMNLP 2004).
Barcelona, Spain.
3. Beliga, Slobodan; Meštrović, Ana; Martinčić- Ipšić, Sanda. (2014). Toward Selectivity-Based
Keyword Extraction for Croatian News (http://ceur-ws.org/Vol-1310/paper1.pdf) (PDF).
Surfacing the Deep and the Social Web (SDSW 2014). Vol. 1310. Italy: CEUR Proc. pp. 1–
14.
4. Alrehamy, H.; Walker, C. (2017). SemCluster: Unsupervised Automatic Keyphrase Extraction
Using Affinity Propagation (https://www.researchgate.net/publication/318436236). 17th UK
Workshop on Computational Intelligence.
5. "Keyword Extraction: from TF-IDF to BERT" (https://towardsdatascience.com/keyword-extra
ction-python-tf-idf-textrank-topicrank-yake-bert-7405d51cd839?sk=9276034554ccaf8761d6
2be1ca471131).
6. Tayfun Pay; Stephen Lucci (2017). Automatic Keyword Extraction: An Ensemble Method.
2017 IEEE International Conference on Big Data (Big Data).
doi:10.1109/BigData.2017.8258552 (https://doi.org/10.1109%2FBigData.2017.8258552).
Further reading
Nazanin Firoozeh; Adeline Nazarenko; Fabrice Alizon; Béatrice Daille (11 November 2019).
"Keyword extraction: Issues and methods". Natural Language Engineering. 26 (3): 259–291.
doi:10.1017/S1351324919000457 (https://doi.org/10.1017%2FS1351324919000457).
ISSN 1351-3249 (https://www.worldcat.org/issn/1351-3249). Wikidata Q109971296.
Retrieved from "https://en.wikipedia.org/w/index.php?title=Keyword_extraction&oldid=1153316954"