Improving the clarity of questions in Community Question Answering networks

Alireza Khabbazan¹,
Ahmad Ali Abin^nAff1 &
Viet-Vu Vu²

316 Accesses
Explore all metrics

Abstract

Every day, thousands of questions are asked on the Community Question Answering network, making these questions and answers extremely valuable for information seekers around the world. However, a significant proportion of these questions do not elicit proper answers. There are several reasons for this, with the lack of clarity in questions being one of the most crucial factors. In this study, our primary focus is on enhancing the clarity of unclear questions in Community Question Answering networks. In the first step, DistilBERT, which uses Siamese and triplet network structures for meaningful sentence embeddings, is combined with HDBSCAN, effective in diverse noise datasets and less sensitive to density variations, to extract unique features from each question. Questions were then categorized as clear or unclear using an Extremely Randomized Trees ensemble model, known for its robust resistance to class imbalance, with more than 90% accuracy. Next, efforts were made to extract information that could enhance the clarity of unclear questions by comparing them with similar, clearer questions using Dynamic Time Warping, a versatile technique suitable for time series analyses in information systems and applicable across various domains. Finally, the extracted information was incorporated into the feature vector of unclear questions based on histogram-coverage methods to enhance their clarity. When a question is made clearer, the missing information and its importance are shown to the questioner. This enables the questioner to be aware of the missing information and facilitates them in clarifying the question.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Fig. 9

Fig. 25

Identifying Unclear Questions in Community Question Answering Websites

When, Where, Who, What or Why? A Hybrid Model to Question Answering Systems

3Q: A 3-Layer Semantic Analysis Model for Question Suite Reduction

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

No datasets were generated or analysed during the current study.

Notes

Datasets link: https://gustav1.ux.uis.no/downloads/ecir2019-qac/ecir2019-qac-data.zip

References

Adamic, L. A., Zhang, J., Bakshy, E. et al. (2008). Knowledge sharing and yahoo answers: Everyone knows something. In: Proceedings of the 17th International Conference on World Wide Web, WWW ’08, p 665–674. https://doi.org/10.1145/1367497.1367587
Aliannejadi, M., Zamani, H., Crestani, F. et al. (2019). Asking clarifying questions in open-domain information-seeking conversations. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’19, p 475–484. https://doi.org/10.1145/3331184.3331265
Asaduzzaman, M., Mashiyat, A. S., Roy, C. K. et al. (2013). Answering questions about unanswered questions of stack overflow. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 97–100, https://doi.org/10.1109/MSR.2013.6624015
Caliński T, Harabasz J,. (1974). A dendrite method for cluster analysis. Communications in Statistics, 3(1), 1–27. https://doi.org/10.1080/03610927408827101
Campello, R. J. G. B., Moulavi, D., Sander, J. (2013a). Density-based clustering based on hierarchical density estimates. In: Advances in Knowledge Discovery and Data Mining (pp 160–172). https://doi.org/10.1007/978-3-642-37456-2_14
Campello, R. J. G. B., Moulavi, D., Zimek, A., et al. (2013b). A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies. Data Mining and Knowledge Discovery, 27(3), 344–371. https://doi.org/10.1007/s10618-013-0311-4
Article MathSciNet MATH Google Scholar
Cortes, E. G., Woloszyn, V., Barone, D., et al. (2022). A systematic review of question answering systems for non-factoid questions. Journal of Intelligent Information Systems, 58(3), 453–480. https://doi.org/10.1007/s10844-021-00655-8
Article MATH Google Scholar
Dehghan, M., & Abin, A. A. (2019). Translations diversification for expert finding: A novel clustering-based approach. ACM Transactions on Knowledge Discovery from Data, 13(3). https://doi.org/10.1145/3320489
Dehghan, M., Abin, A. A., & Neshati, M. (2020a). An improvement in the quality of expert finding in community question answering networks. Decision Support Systems, 139, 113425. https://doi.org/10.1016/j.dss.2020.113425
Article MATH Google Scholar
Dehghan, M., Rahmani, H. A., Abin, A. A., et al. (2020b). Mining shape of expertise: A novel approach based on convolutional neural network. Information Processing & Management, 57(4), 102239. https://doi.org/10.1016/j.ipm.2020.102239
Article Google Scholar
Devlin, J., Chang, M. W., Lee, K. et al. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (vol. 1, pp 4171–4186). https://doi.org/10.18653/v1/N19-1423
Dimitrakis, E., Sgontzos, K., & Tzitzikas, Y. (2020). A survey on question answering systems over linked data and documents. Journal of Intelligent Information Systems, 55(2), 233–259. https://doi.org/10.1007/s10844-019-00584-7
Article Google Scholar
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42. https://doi.org/10.1007/s10994-006-6226-1
Article MATH Google Scholar
Khabbazan, A., Abin, A. A. (2021). A topic based method to classify the question clarity in cqa networks. In: 2021 12th International Conference on Information and Knowledge Technology (IKT), pp 96–101. https://doi.org/10.1109/IKT54664.2021.9685163
Khabbazan, A., Abin, A. A. (2023) Classifying the clarity of questions in cqa networks: A topic based approach. AUT Journal of Electrical Engineering,. https://doi.org/10.22060/eej.2022.21287.5468
Kim, Y. (2014). Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1746–1751. https://doi.org/10.3115/v1/D14-1181
Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., et al. (2019). Text classification algorithms: A survey. Information, 10(4). https://doi.org/10.3390/info10040150
Li, B., Jin, T., Lyu, M. R. et al. (2012). Analyzing and predicting question quality in community question answering services. In: Proceedings of the 21st International Conference on World Wide Web, WWW ’12 Companion, p 775–782. https://doi.org/10.1145/2187980.2188200
Lin, C. Y., Wu, Y. H., & Chen, A. L. P. (2021). Selecting the most helpful answers in online health question answering communities. Journal of Intelligent Information Systems, 57(2), 271–293. https://doi.org/10.1007/s10844-021-00640-1
Article MATH Google Scholar
Liu, Z., & Jansen, B. J. (2018). Questioner or question: Predicting the response rate in social question and answering on sina weibo. Information Processing & Management, 54(2), 159–174. https://doi.org/10.1016/j.ipm.2017.10.004
Article MATH Google Scholar
Mishra, A., & Jain, S. K. (2016). A survey on question answering systems with classification. Journal of King Saud University - Computer and Information Sciences, 28(3), 345–361. https://doi.org/10.1016/j.jksuci.2014.10.007
Article MATH Google Scholar
Molino, P., Aiello, L. M., & Lops, P. (2016). Social question answering: Textual, user, and network features for best answer prediction. ACM Transactions on Information Systems, 35(1). https://doi.org/10.1145/2948063
Müller, M. (2007). Dynamic Time Warping (pp 69–84). https://doi.org/10.1007/978-3-540-74048-3_4
Neshati, M. (2017). On early detection of high voted q &a on stack overflow. Information Processing & Management, 53(4), 780–798. https://doi.org/10.1016/j.ipm.2017.02.005
Article MATH Google Scholar
Othman, N., Faiz, R., & Smaïli, K. (2019). Enhancing question retrieval in community question answering using word embeddings. Procedia Computer Science, 159, 485–494. https://doi.org/10.1016/j.procs.2019.09.203
Article Google Scholar
Pang, R. Y., Parrish, A., Joshi, N. et al. (2022). QuALITY: Question answering with long input texts, yes! In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 5336–5358. https://doi.org/10.18653/v1/2022.naacl-main.391
Ponzanelli, L., Mocci, A., Bacchelli, A., et al. (2014). Understanding and classifying the quality of technical forum questions. In: 2014 14th International Conference on Quality Software, pp 343–352. https://doi.org/10.1109/QSIC.2014.27
Rao, S., Daumé III, H. (2018). Learning to ask good questions: Ranking clarification questions using neural expected value of perfect information. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers). https://doi.org/10.18653/v1/P18-1255
Ravi, S., Pang, B., Rastogi, V. et al. (2014). Great question! question quality in community q &a. In: Proceedings of the International Conference on Web and Social Media, pp 426–435. https://doi.org/10.1609/icwsm.v8i1.14529
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. https://doi.org/10.1016/0377-0427(87)90125-7
Article MATH Google Scholar
Tausczik, Y. R., Pennebaker, J. W. (2011). Predicting the perceived quality of online mathematics contributions from users’ reputations. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’11, p 1885–1888. https://doi.org/10.1145/1978942.1979215
Trienes, J., Balog, K. (2019). Identifying unclear questions in community question answering websites. Advances in Information Retrieval (pp 276–289). https://doi.org/10.1007/978-3-030-15712-8_18
Xu, J., & Croft, W. B. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 18(1), 79–112. https://doi.org/10.1145/333135.333138
Article MATH Google Scholar
Yan, Z., & Zhou, J. (2015). Optimal answerer ranking for new questions in community question answering. Information Processing & Management, 51(1), 163–178. https://doi.org/10.1016/j.ipm.2014.07.009
Article MATH Google Scholar
Yue, Z., Zeng, H., Kou, Z. et al. (2022). Domain adaptation for question answering via question classification. In: Proceedings of the 29th International Conference on Computational Linguistics, pp 1776–1790. https://aclanthology.org/2022.coling-1.153
Zamani, H., Dumais, S., Craswell, N. et al. (2020). Generating clarifying questions for information retrieval. In: Proceedings of The Web Conference 2020, WWW ’20, p 418–428. https://doi.org/10.1145/3366423.3380126

Download references

Funding

This research has been done under the research project QG.21.58 “Researching and developing clustering integrating constraints and deep learning algorithms” of Vietnam National University, Hanoi.

Author information

Ahmad Ali Abin
Present address: Faculty of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran

Authors and Affiliations

Faculty of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran
Alireza Khabbazan
VNU Information Technology Institute, Vietnam National University, Hanoi, Vietnam
Viet-Vu Vu

Authors

Alireza Khabbazan
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Ali Abin
View author publications
You can also search for this author in PubMed Google Scholar
Viet-Vu Vu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Alireza Khabbazan designed the model settings, collected the data, conceived the experiments, analyzed the results, and prepared the original draft. Ahmad Ali Abin conceptualized the problem, conceived the original idea, and defined the problem. Viet-Vu Vu administrated the research and carried out the experiments. All authors reviewed the manuscript.

Corresponding author

Correspondence to Ahmad Ali Abin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval

This study was conducted in accordance with general ethical guidelines applicable to community question-answering research. It did not involve human or animal subjects, hence specific ethical committee approval was not required.

Availability of supporting data

The datasets used during the current study are available. Additionally, the datasets generated, model settings, and training processes are available from the corresponding author upon reasonable request.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Khabbazan, A., Abin, A.A. & Vu, VV. Improving the clarity of questions in Community Question Answering networks. J Intell Inf Syst 62, 1631–1658 (2024). https://doi.org/10.1007/s10844-024-00847-y

Download citation

Received: 05 December 2023
Revised: 01 February 2024
Accepted: 02 February 2024
Published: 02 May 2024
Issue Date: December 2024
DOI: https://doi.org/10.1007/s10844-024-00847-y

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Identifying Unclear Questions in Community Question Answering Websites

When, Where, Who, What or Why? A Hybrid Model to Question Answering Systems

3Q: A 3-Layer Semantic Analysis Model for Question Suite Reduction

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Availability of supporting data

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Improving the clarity of questions in Community Question Answering networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Identifying Unclear Questions in Community Question Answering Websites

When, Where, Who, What or Why? A Hybrid Model to Question Answering Systems

3Q: A 3-Layer Semantic Analysis Model for Question Suite Reduction

Explore related subjects

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Availability of supporting data

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation