Improving User Intent Detection in Urdu Web Queries with Capsule Net Architectures
<p>Frequency distribution of web queries with respect to query length. No. of queries are on the Y-axis and query length is on the X-axis.</p> "> Figure 2
<p>Intent annotation process for Urdu web queries dataset.</p> "> Figure 3
<p>U-IntentCapsNet architecture for intent detection using capsule neural network.</p> "> Figure 4
<p>Accuracy and loss curves during training of the proposed U-IntentCapsNet model.</p> "> Figure 5
<p>Learning curves of baseline models with pre-trained W2V embeddings and the proposed U-IntentCapsNet model in terms of accuracy.</p> "> Figure 6
<p>Embedding vector representation with W2V for sample two-term queries.</p> "> Figure 7
<p>Influence of routing iterations on the proposed U-IntentCapsNet model.</p> ">
Abstract
:1. Introduction
- The development of a customized neural network-based model for intent detection, namely, U-IntentCapsNet, utilizing LSTM cells and an iterative routing mechanism between capsules to effectively discriminate diversely-expressed search intents (presented in Section 5).
- A rigorous performance evaluation of the proposed model depicting state of the art results for intent detection outperforming several strong baselines and alternate classification techniques (presented in Section 7).
2. Related Works
3. Urdu Web Queries Dataset (UWQ-22)
4. Dataset Annotation
4.1. Data Pre-Processing
4.2. Annotation Rules
4.2.1. Navigational Intent
- Queries containing domain suffixes, e.g., زمین ۔ کوم (Zameen.com), and دراز ڈاٹ پی کے (Daraz.pk);
- Queries with the names of online platforms, e.g., زوم (Zoom), ویکیپیڈیا (Wikipedia), فیسبوک (Facebook) and ایمیزون (Amazon);
- Queries with an organizational or brand name, e.g., ڈذنی لینڈ (Disneyland), ریڈیو پاکستان (Radio Pakistan), and سماء نیوز (SAMAA News).
4.2.2. Transactional Intent
- Queries containing multimedia and text-based file formats or extensions, e.g., سورہ رحمن mp3 (Quranic Verse Surah Rahman mp3), and ہیری پوٹر pdf (Harry Potter pdf.);
- Queries containing terms related to entertainment videos or audios, books and course-works, e.g., ارتغل ترقی ڈرامہ (Ertugul Turkish drama), ہیری پوٹر (Harry Potter), CSS جغرافیائی سلیبس (Geography Syllabus CSS), and سورہ رحمن (Quranic Verse Surah Rahman);
- Queries containing terms related to technology (i.e., software, applications, anti-virus, and drivers) downloads e.g., 11 ونڈو (Windows 11), and آن لائن ہاکی (online hockey);
- Queries with e-commerce and booking related terms, e.g., لاہور سے استنبول کی پروازیں (flight from Lahore to Istanbul), مالدیپ ہوٹل ریٹ (hotel rates Maldivies), تاج محل کا ٹکٹ (Taj Mahal ticket), and سوزوکی مہران برائے فروخت (Suzuki Mehran sale and purchase);
- Queries with terms related to the weather, maps, and calculators, e.g., اج کا لاہور کا موسم (today’s Lahore weather), اسام آباد سے مری کا فاصلہ (distance between Islamabad and Murree), and پیکجز مال کا راستہ (route to Packages Mall).
4.2.3. Informational Intent
- Queries having interrogative phrases, e.g., نینو ٹیکنالوجی کیا ہے؟ (what is nanotechnology?), کشمیر پریمئر لیگ کون جیتا (who won the Kashmir Premier League?), and کس ملک کی آبادی سب سے کم ہے (which country has the least population?);
- Queries containing names of celebrities or famous personalities, e.g., کپل شرما (Kapil Sharma), فیض احمد فیض (Faiz Ahmad Faiz), and غالب (Ghalib);
- Queries with natural language terms, e.g., موبائل لاک کھولنے کا طریقہ (procedure of opening a mobile (phone) lock), کرونا ویکسین کی ایجاد (invention of the Corona Vaccine), کرکٹ ورلڈ کپ (the Cricket World Cup), and کرپٹو کرنسی (crypto currency).
4.3. Manual Annotation and Consistency Evaluation
4.4. Intent Annotated Dataset Statistics
5. Proposed Model Architecture
5.1. Capsule Network
5.2. Proposed U-IntentCapsNet
5.2.1. Embedding Layer
5.2.2. WordCaps
5.2.3. Dynamic Routing between Capsules
Algorithm 1: Dynamic Routing by Agreement. |
1: procedure DYNAMIC ROUTING (, ) 2: for each WordCaps t and IntentCaps : ← 0. 3: for iterations do 4: for all WordCaps : ← () 5: for all IntentCaps : ← 6: for all IntentCaps : = () 7: for all WordCaps and IntentCaps : 8: end for 9: Return 10: end procedure |
5.2.4. IntentCaps with Max-Margin Loss
6. Experimental Setup
6.1. Dataset
6.2. Baselines
- SVM/NB/MLP/LR (Baselines I–IV): four machine learning baselines were curated with TF-IDF features to represent the query, and a support vector machine (SVM) with a linear kernel/multinomial naïve bayes (NB)/multi-layer perceptron (MLP)/logistic regression (LR) as the classifier.
- Convolutional neural network (CNN) (Baseline V): this baseline was setup using the architecture proposed in [37] having an n-gram convolutional layer and pooling operation for the text classification.
- LSTM (Baseline VI): In this baseline, a recurrent neural network, namely, the long short term memory (LSTM) network [29] was used, with a unidirectional forward layer and its last hidden state used for classification.
- BiLSTM (Baseline VII): in this baseline, a bi-directional long short term memory (BiLSTM) network, [28] having a bi-directional forward layer, was used and the last hidden state was used for the classification.
- C-LSTM (Baseline VIII): this baseline was developed according to the text classification architecture proposed in [38]. In this architecture, concatenated convolutional and recurrent layers are used, in which the output of a CNN layer is given to a LSTM for classification.
- W2V-100/200/300+ U-IntentCapsNet (Baselines IX–XI): in these baselines, BiLSTM-based embeddings layer were excluded in the proposed U-IntentCapsNet model, and the pre-trained Urdu W2V-100/200/300 dimensional embeddings reported in [39] were used. The Urdu W2V embeddings were trained from a vocabulary of 72,000 words using a window size of five words.
- mBERT+ U-IntentCapsNet (Baseline XII): in this baseline, pre-trained mBert embeddings [31] were used in the proposed U-IntentCapsNet model. A number of search terms used in queries were underrepresented in the mBert language model, e.g., the tokenized output of the query, “ورلڈ کپ” (world cup) was “‘و’, ‘##رل’, ‘##ڈ’, ‘ک’, ‘##پ” and “کرونا ویکسین” (Corona vaccine) was ‘کر’, ‘##ونا’, ‘وی’, ‘##کس’, ‘##ین’. Therefore, for an optimal tokenization, the Vocab file of the Bert Tokenizer was updated with 2815 out of vocabulary (OOV) tokens extracted from the Urdu web queries dataset reported in Section 3. The added OOV tokens were assigned a unique token ID which would otherwise have been mapped to an <UNK> (Unknown) symbol.
6.3. Hyperparameters
6.4. Evaluation Metrics
7. Results and Discussion
7.1. Comparison with Baselines of Alternate Classification Techniques
7.2. Comparison with the Baselines Using Pre-Trained Embeddings
7.3. Ablation Results
7.4. Error Analysis
- Skewed dataset: As per the general searching trends, web queries datasets consist of more informational queries than navigational and transactional queries [12]. This characteristic was also present in the Urdu web queries dataset in which 76% of the queries were informational. A maximum inter-class confusion was observed between the NAV-INFO and TRAN-INFO classes; however, the proposed model performed better in discriminating these classes as compared to the baselines.
- Named entities: NAV class queries that did not have “www” or domain identifier, e.g., .com, or .pk, tended to be misclassified as INFO. A deeper analysis showed that most of those queries had brand names or other named entities that had a very low occurrence in the dataset, potentially causing the mis-classification.
- Queries with more than one valid intent: Due to the nature of the data, it is possible that the user intent could have belonged to two intent classes. For this reason, the TRAN suffered the most as transactional queries, being more descriptive, adopted the jargon and characteristics of the INFO queries. For example, “ڈورےمون” (Doremon), and “بانگدرہ پر تبصرہ” (Analysis on Bang-e-Dara, pointing to a book download), was predicted as INFO although it was annotated as TRAN, when both labels can also be true. Similar confusions could be seen in other NAV queries as they were largely misclassified as INFO.
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Jansen, B.J.; Booth, D. Classifying Web Queries by Topic and User Intent. In Proceedings of the CHI’10 Extended Abstracts on Human Factors in Computing Systems, Atlanta, GA, USA, 10–15 April 2010; pp. 4285–4290. [Google Scholar]
- Roy, R.S.; Agarwal, S.; Ganguly, N.; Choudhury, M. Syntactic complexity of web search queries through the lenses of language models, networks and users. Inf. Process. Manag. 2016, 52, 923–948. [Google Scholar] [CrossRef]
- Barr, C.; Jones, R.; Regelson, M. The linguistic structure of English web-search queries. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, EMNLP, Honolulu, HI, USA, 25–27 October 2008; pp. 1021–1030. [Google Scholar]
- Shafiq, H.M.; Tahir, B.; Mehmood, M.A.; Pinto, D.; Singh, V.; Perez, F. Towards building a Urdu Language Corpus using Common Crawl. J. Intell. Fuzzy Syst. 2020, 39, 2445–2455. [Google Scholar] [CrossRef]
- Broder, A. A Taxonomy of Web Search. SIGIR Forum 2002, 36, 3–10. [Google Scholar] [CrossRef]
- Dou, Z.; Guo, J. Query Intent Understanding. In Query Understanding for Search Engines; Chang, Y., Deng, H., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 69–101. [Google Scholar] [CrossRef]
- Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic routing between capsules. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 3859–3869. [Google Scholar]
- Liu, H.; Liu, Y.; Wong, L.-P.; Lee, L.-K.; Hao, T.; Yang, Z. A Hybrid Neural Network BERT-Cap Based on Pre-Trained Language Model and Capsule Network for User Intent Classification. Complexity 2020, 2020, 8858852. [Google Scholar] [CrossRef]
- Zhang, C.; Li, Y.; Du, N.; Fan, W.; Yu, P. Joint Slot Filling and Intent Detection via Capsule Neural Networks; Association for Computational Linguistics: Florence, Italy, 2019; pp. 5259–5267. [Google Scholar]
- Xia, C.; Zhang, C.; Yan, X.; Chang, Y.; Yu, P. Zero-Shot User Intent Detection via Capsule Neural Networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 3090–3099. [Google Scholar] [CrossRef] [Green Version]
- Shams, S.; Aslam, M.; Martinez-Enriquez, A. Lexical Intent Recognition in Urdu Queries Using Deep Neural Networks. In Advances in Soft Computing, Proceedings of the 18th Mexican International Conference on Artificial Intelligence, MICAI 2019, Xalapa, Mexico, 27 October–2 November 2019; Lecture Notes in Computer Science; Martínez-Villaseñor, L., Batyrshin, I., Marín-Hernández, A., Eds.; Springer: Cham, Switzerland, 2019; Volume 11835, pp. 39–50. [Google Scholar]
- Jansen, B.J.; Booth, D.L.; Spink, A. Determining the informational, navigational, and transactional intent of Web queries. Inf. Process. Manage. 2008, 44, 1251–1266. [Google Scholar] [CrossRef]
- Rose, D.E.; Levinson, D. Understanding User Goals in Web Search. In Proceedings of the 13th international conference on World Wide Web, New York, NY, USA, 17–20 May 2004; pp. 13–19. [Google Scholar]
- Gonzalez-Caro, C.; Baeza-Yates, R. A Multi-Faceted Approach to Query Intent Classification. In String Processing and Information Retrieval. SPIRE 2011; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2011; Volume 7024. [Google Scholar]
- Kang, I.-H.; Kim, G. Query Type Classification for Web Document Retrieval. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, Toronto, ON, Canada, 28 July–1 August 2003; pp. 64–71. [Google Scholar]
- Pass, G.; Chowdhury, A.; Torgeson, C. A Picture of Search. In Proceedings of the 1st International Conference on Scalable Information Systems, Hong Kong, China, 30 May–1 June 2006; p. 1. [Google Scholar]
- Fernández-Martínez, F.; Luna-Jiménez, C.; Kleinlein, R.; Griol, D.; Callejas, Z.; Montero, J.M. Fine-Tuning BERT Models for Intent Recognition Using a Frequency Cut-Off Strategy for Domain-Specific Vocabulary Extension. Appl. Sci. 2022, 12, 1610. [Google Scholar] [CrossRef]
- Trang, N.T.T.; Anh, D.T.D.; Viet, V.Q.; Woomyoung, P. Advanced Joint Model for Vietnamese Intent Detection and Slot Tagging; Springer International Publishing: Cham, Switzerland, 2022; pp. 125–135. [Google Scholar]
- Schuster, S.; Gupta, S.; Shah, R.; Lewis, M. Cross-Lingual Transfer Learning for Multilingual Task Oriented Dialog; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 3795–3805. [Google Scholar]
- Xu, W.; Haider, B.; Mansour, S. End-to-End Slot Alignment and Recognition for Cross-Lingual NLU; Association for Computational Linguistics: Minneapolis, MN, USA, 2020; pp. 5052–5063. [Google Scholar]
- Hemphill, C.T.; Godfrey, J.J.; Doddington, G.R. The ATIS Spoken Language Systems Pilot Corpus. In Proceedings of the Speech and Natural Language, Hidden Valley, PA, USA, 24–27 June 1990. [Google Scholar]
- Upadhyay, S.; Faruqui, M.; Tür, G.; Dilek, H.; Heck, L. (Almost) Zero-Shot Cross-Lingual Spoken Language Understanding. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 6034–6038. [Google Scholar]
- Balodis, K.; Deksne, D. FastText-Based Intent Detection for Inflected Languages. Information 2019, 10, 161. [Google Scholar] [CrossRef] [Green Version]
- Braun, D.; Hernandez-Mendez, A.; Matthes, F.; Langen, M. Evaluating Natural Language Understanding Services for Conversational Question Answering Systems. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, Saarbrücken, Germany, 15–17 August 2017. [Google Scholar]
- Pinnis, M.; Rikters, M.i.; Krišlauks, R. Tilde’s Machine Translation Systems for WMT 2018; Association for Computational Linguistics: Belgium, Brussels, 2018; pp. 473–481. [Google Scholar]
- Zhang, H.W.S.L.L.C.D.; Xinlei, Z. Query Classification Using Convolutional Neural Networks. In Proceedings of the 10th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 9–10 December 2017; pp. 441–444. [Google Scholar]
- Hashemi, H.B.; Reiner Kraft, A.A. Query Intent Detection using Convolutional Neural Networks. In Proceedings of the International Conference on Web Search and Data Mining, Workshop on Query Understanding, San Francisco, CA, USA, 22–25 February 2016. [Google Scholar]
- Sreelakshmi, K.; Rafeeque, P.C.; Sreetha, S.; Gayathri, E.S. Deep Bi-Directional LSTM Network for Query Intent Detection. Procedia Comput. Sci. 2018, 143, 939–946. [Google Scholar] [CrossRef]
- Ravuri, S.V.; Stolcke, A. Recurrent neural network and LSTM models for lexical utterance classification. In Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association (INTERSPEECH), Dresden, Germany, 6–10 September 2015. [Google Scholar]
- Staliūnaitė, I.; Iacobacci, I. Auxiliary Capsules for Natural Language Understanding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 8154–8158. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; Volume 2, pp. 3111–3119. [Google Scholar]
- Sarigil, E.; Yilmaz, O.; Altingovde, I.S.; Ozcan, R.; Ulusoy, Ö. A “Suggested” Picture of Web Search in Turkish. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2016, 15, 24. [Google Scholar] [CrossRef] [Green Version]
- Cohen, J. A Coefficient of Agreement for Nominal Scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
- Bin Zia, H.; Raza, A.A.; Athar, A. Urdu Word Segmentation using Conditional Random Fields (CRFs). In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 20–26 August 2018; pp. 2562–2569. [Google Scholar]
- Nasim, Z.; Haider, S. Cluster analysis of urdu tweets. J. King Saud Univ.—Comput. Inf. Sci. 2020, 34, 2170–2179. [Google Scholar] [CrossRef]
- Kim, Y. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 26–28 October 2014; pp. 1746–1751. [Google Scholar]
- Lau, C.Z.; Chonglin, S.; Zhiyuan, L.; Francis, C.M. A C-LSTM Neural Network for Text Classification. arXiv 2015, arXiv:1511.08630. [Google Scholar]
- Ehsan, T.; Khalid, J.; Ambreen, S.; Mustafa, A.; Hussain, S. Improving Phrase Chunking by using Contextualized Word Embeddings for a Morphologically-Rich Language. Arab. J. Sci. Eng. 2022, 47, 9781–9799. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Sr. No. | Category | Count |
---|---|---|
1. | Queries | 11,751 |
2. | Unique queries | 8697 |
3. | Query terms | 42,214 |
4. | Unique query terms | 38,789 |
5. | Mean query length | 3.78 |
6. | Users | 165 |
7. | Domains | 11 |
Sr. No. | Domain | # of Queries | # of Terms |
---|---|---|---|
1. | Books | 1068 | 3562 |
2. | Business | 973 | 3573 |
3. | Entertainment | 1253 | 2820 |
4. | Health | 1112 | 3883 |
5. | Travel | 1080 | 4416 |
6. | Technology | 1087 | 3894 |
7. | News | 1103 | 4323 |
8. | Fact–Info | 1017 | 4023 |
9. | Shopping | 1020 | 3847 |
10. | Geography | 964 | 3797 |
11. | Sports | 1074 | 4076 |
Total | 11,751 | 42,214 |
Sr. No. | Domain | Min. | Max. | Mean |
---|---|---|---|---|
1. | Books | 1 | 18 | 3.4 |
2. | Business | 1 | 21 | 3.7 |
3. | Entertainment | 1 | 31 | 3.4 |
4. | Health | 1 | 27 | 3.5 |
5. | Travel | 1 | 22 | 4.1 |
6. | Technology | 1 | 24 | 3.6 |
7. | News | 1 | 27 | 4.0 |
8. | Fact–Info | 1 | 20 | 4.0 |
9. | Shopping | 1 | 38 | 3.9 |
10. | Geography | 1 | 20 | 4.0 |
11. | Sports | 1 | 27 | 3.8 |
Sr. No. | Intent | Frequency | Coverage |
---|---|---|---|
1. | INFO | 6495 | 76.25% |
2. | NAV | 845 | 9.92% |
3. | TRAN | 1178 | 13.82% |
Total | 8518 |
Sr. No. | Intent | Frequency | Min. | Max. | Mean |
---|---|---|---|---|---|
1. | INFO | 6495 | 1 | 31 | 4.0 |
2. | NAV | 845 | 1 | 15 | 3.2 |
3. | TRAN | 1178 | 1 | 20 | 3.7 |
Sr. No. | Domain | Informational (%) | Transactional (%) | Navigational (%) |
---|---|---|---|---|
1. | Books | 7% | 2% | 28% |
2. | Business | 8% | 15% | 1% |
3. | Entertainment | 9% | 10% | 36% |
4. | Health | 9% | 1% | 5% |
5. | Travel | 7% | 1% | 2% |
6. | Technology | 12% | 9% | 2% |
7. | News | 10% | 7% | 7% |
8. | Fact–Info | 8% | 20% | 4% |
9. | Shopping | 10% | 7% | 4% |
10. | Geography | 11% | 19% | 1% |
11. | Sports | 10% | 8% | 9% |
Sr. No. | Intent | Train | Test | Dev. | Total |
---|---|---|---|---|---|
1. | INFO | 5196 | 650 | 650 | 6495 |
2. | NAV | 677 | 84 | 84 | 845 |
3. | TRAN | 946 | 116 | 116 | 1178 |
Total | 6819 | 850 | 850 | 8519 |
Intent | Prec. | Recall | F1 | Acc. |
---|---|---|---|---|
INFO | 0.9357 | 0.9523 | 0.9439 | 0.9523 |
NAV | 0.7622 | 0.7485 | 0.7553 | 0.7485 |
TRAN | 0.7907 | 0.7234 | 0.7555 | 0.7234 |
INFO | NAV | TRAN | |
---|---|---|---|
INFO | 1260 | 17 | 22 |
NAV | 37 | 122 | 8 |
TRAN | 57 | 10 | 168 |
Baseline | Model | Prec. | Recall | F1 | Acc. |
---|---|---|---|---|---|
I | SVM | 0.83 | 0.82 | 0.82 | 0.82 |
II | NB | 0.83 | 0.81 | 0.82 | 0.82 |
III | MLP | 0.82 | 0.81 | 0.81 | 0.80 |
IV | LR | 0.81 | 0.79 | 0.79 | 0.78 |
V | CNN | 0.875 | 0.700 | 0.764 | 0.880 |
VI | LSTM | 0.740 | 0.775 | 0.754 | 0.843 |
VII | BLSTM | 0.747 | 0.788 | 0.765 | 0.859 |
VIII | CLSTM | 0.572 | 0.367 | 0.352 | 0.773 |
U-IntentCapsNet | 0.911 | 0.911 | 0.908 | 0.908 |
Baseline | Model | Prec. | Recall | F1 | Acc. |
---|---|---|---|---|---|
IX | W2V-100 + U-IntentCapsNet | 0.8110 | 0.8242 | 0.8136 | 0.8242 |
X | W2V-200 + U-IntentCapsNet | 0.8524 | 0.8561 | 0.8533 | 0.8561 |
XI | W2V-300 + U-IntentCapsNet | 0.8566 | 0.8613 | 0.8583 | 0.8613 |
XII | mBERT + U-IntentCapsNet | 0.8432 | 0.8394 | 0.8391 | 0.8394 |
U-IntentCapsNet | 0.9083 | 0.9112 | 0.9084 | 0.9112 |
Baseline | Model | INFO | TRAN | NAV |
---|---|---|---|---|
IX | W2V-100 + U-IntentCapsNet | 0.8777 | 0.2947 | 0.1856 |
X | W2V-200 + U-IntentCapsNet | 0.888 | 0.4324 | 0.2745 |
XI | W2V-300 + U-IntentCapsNet | 0.8789 | 0.4522 | 0.0985 |
U-IntentCapsNet | 0.9439 | 0.7553 | 0.7555 |
Sr. No. | Two-Term Queries | Cosine Similarity |
---|---|---|
1. | ٹومجیری (Tom, Jerry) | 0.6948 |
2. | ٹیوی (T,V) | 0.3956 |
3. | ایفون (I, Phone) | 0.3090 |
4. | کروناویکسین (Corona, Vaccine) | 0.2731 |
5. | سمارٹفون (Smart, Phone) | 0.2054 |
Sr. No. | Corona | Iphone |
---|---|---|
1. | کورونہ | آئیفون |
2. | کورونا | ای فون |
3. | کرونا | آءی فون |
4. | کرو نا | ائ فون |
5. | کرونہ | آئی فون |
Model | Prec. | Recall | F1 | Acc. |
---|---|---|---|---|
U-IntentCapsNet w/o BLSTM | 0.8735 | 0.8777 | 0.8741 | 0.8777 |
U-IntentCapsNet w/o Regularizer | 0.9018 | 0.9048 | 0.9026 | 0.9047 |
U-IntentCapsNet | 0.9083 | 0.9112 | 0.9084 | 0.9112 |
Model | Prec. | Recall | F1 | Acc. |
---|---|---|---|---|
U-IntentCapsNet-100 | 0.8533 | 0.8607 | 0.8500 | 0.8607 |
U-IntentCapsNet-200 | 0.8702 | 0.8754 | 0.8712 | 0.8754 |
U-IntentCapsNet | 0.9083 | 0.9112 | 0.9084 | 0.9112 |
Routing Iterations | F1 | Acc. |
---|---|---|
2 | 0.9084 | 0.9112 |
3 | 0.8994 | 0.9006 |
5 | 0.8994 | 0.9001 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shams, S.; Aslam, M. Improving User Intent Detection in Urdu Web Queries with Capsule Net Architectures. Appl. Sci. 2022, 12, 11861. https://doi.org/10.3390/app122211861
Shams S, Aslam M. Improving User Intent Detection in Urdu Web Queries with Capsule Net Architectures. Applied Sciences. 2022; 12(22):11861. https://doi.org/10.3390/app122211861
Chicago/Turabian StyleShams, Sana, and Muhammad Aslam. 2022. "Improving User Intent Detection in Urdu Web Queries with Capsule Net Architectures" Applied Sciences 12, no. 22: 11861. https://doi.org/10.3390/app122211861