Abstract
Every day, thousands of questions are asked on the Community Question Answering network, making these questions and answers extremely valuable for information seekers around the world. However, a significant proportion of these questions do not elicit proper answers. There are several reasons for this, with the lack of clarity in questions being one of the most crucial factors. In this study, our primary focus is on enhancing the clarity of unclear questions in Community Question Answering networks. In the first step, DistilBERT, which uses Siamese and triplet network structures for meaningful sentence embeddings, is combined with HDBSCAN, effective in diverse noise datasets and less sensitive to density variations, to extract unique features from each question. Questions were then categorized as clear or unclear using an Extremely Randomized Trees ensemble model, known for its robust resistance to class imbalance, with more than 90% accuracy. Next, efforts were made to extract information that could enhance the clarity of unclear questions by comparing them with similar, clearer questions using Dynamic Time Warping, a versatile technique suitable for time series analyses in information systems and applicable across various domains. Finally, the extracted information was incorporated into the feature vector of unclear questions based on histogram-coverage methods to enhance their clarity. When a question is made clearer, the missing information and its importance are shown to the questioner. This enables the questioner to be aware of the missing information and facilitates them in clarifying the question.

























Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
No datasets were generated or analysed during the current study.
Notes
References
Adamic, L. A., Zhang, J., Bakshy, E. et al. (2008). Knowledge sharing and yahoo answers: Everyone knows something. In: Proceedings of the 17th International Conference on World Wide Web, WWW ’08, p 665–674. https://doi.org/10.1145/1367497.1367587
Aliannejadi, M., Zamani, H., Crestani, F. et al. (2019). Asking clarifying questions in open-domain information-seeking conversations. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’19, p 475–484. https://doi.org/10.1145/3331184.3331265
Asaduzzaman, M., Mashiyat, A. S., Roy, C. K. et al. (2013). Answering questions about unanswered questions of stack overflow. In: 2013 10th Working Conference on Mining Software Repositories (MSR), pp 97–100, https://doi.org/10.1109/MSR.2013.6624015
Caliński T, Harabasz J,. (1974). A dendrite method for cluster analysis. Communications in Statistics, 3(1), 1–27. https://doi.org/10.1080/03610927408827101
Campello, R. J. G. B., Moulavi, D., Sander, J. (2013a). Density-based clustering based on hierarchical density estimates. In: Advances in Knowledge Discovery and Data Mining (pp 160–172). https://doi.org/10.1007/978-3-642-37456-2_14
Campello, R. J. G. B., Moulavi, D., Zimek, A., et al. (2013b). A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies. Data Mining and Knowledge Discovery, 27(3), 344–371. https://doi.org/10.1007/s10618-013-0311-4
Cortes, E. G., Woloszyn, V., Barone, D., et al. (2022). A systematic review of question answering systems for non-factoid questions. Journal of Intelligent Information Systems, 58(3), 453–480. https://doi.org/10.1007/s10844-021-00655-8
Dehghan, M., & Abin, A. A. (2019). Translations diversification for expert finding: A novel clustering-based approach. ACM Transactions on Knowledge Discovery from Data, 13(3). https://doi.org/10.1145/3320489
Dehghan, M., Abin, A. A., & Neshati, M. (2020a). An improvement in the quality of expert finding in community question answering networks. Decision Support Systems, 139, 113425. https://doi.org/10.1016/j.dss.2020.113425
Dehghan, M., Rahmani, H. A., Abin, A. A., et al. (2020b). Mining shape of expertise: A novel approach based on convolutional neural network. Information Processing & Management, 57(4), 102239. https://doi.org/10.1016/j.ipm.2020.102239
Devlin, J., Chang, M. W., Lee, K. et al. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, (vol. 1, pp 4171–4186). https://doi.org/10.18653/v1/N19-1423
Dimitrakis, E., Sgontzos, K., & Tzitzikas, Y. (2020). A survey on question answering systems over linked data and documents. Journal of Intelligent Information Systems, 55(2), 233–259. https://doi.org/10.1007/s10844-019-00584-7
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42. https://doi.org/10.1007/s10994-006-6226-1
Khabbazan, A., Abin, A. A. (2021). A topic based method to classify the question clarity in cqa networks. In: 2021 12th International Conference on Information and Knowledge Technology (IKT), pp 96–101. https://doi.org/10.1109/IKT54664.2021.9685163
Khabbazan, A., Abin, A. A. (2023) Classifying the clarity of questions in cqa networks: A topic based approach. AUT Journal of Electrical Engineering,. https://doi.org/10.22060/eej.2022.21287.5468
Kim, Y. (2014). Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1746–1751. https://doi.org/10.3115/v1/D14-1181
Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., et al. (2019). Text classification algorithms: A survey. Information, 10(4). https://doi.org/10.3390/info10040150
Li, B., Jin, T., Lyu, M. R. et al. (2012). Analyzing and predicting question quality in community question answering services. In: Proceedings of the 21st International Conference on World Wide Web, WWW ’12 Companion, p 775–782. https://doi.org/10.1145/2187980.2188200
Lin, C. Y., Wu, Y. H., & Chen, A. L. P. (2021). Selecting the most helpful answers in online health question answering communities. Journal of Intelligent Information Systems, 57(2), 271–293. https://doi.org/10.1007/s10844-021-00640-1
Liu, Z., & Jansen, B. J. (2018). Questioner or question: Predicting the response rate in social question and answering on sina weibo. Information Processing & Management, 54(2), 159–174. https://doi.org/10.1016/j.ipm.2017.10.004
Mishra, A., & Jain, S. K. (2016). A survey on question answering systems with classification. Journal of King Saud University - Computer and Information Sciences, 28(3), 345–361. https://doi.org/10.1016/j.jksuci.2014.10.007
Molino, P., Aiello, L. M., & Lops, P. (2016). Social question answering: Textual, user, and network features for best answer prediction. ACM Transactions on Information Systems, 35(1). https://doi.org/10.1145/2948063
Müller, M. (2007). Dynamic Time Warping (pp 69–84). https://doi.org/10.1007/978-3-540-74048-3_4
Neshati, M. (2017). On early detection of high voted q &a on stack overflow. Information Processing & Management, 53(4), 780–798. https://doi.org/10.1016/j.ipm.2017.02.005
Othman, N., Faiz, R., & Smaïli, K. (2019). Enhancing question retrieval in community question answering using word embeddings. Procedia Computer Science, 159, 485–494. https://doi.org/10.1016/j.procs.2019.09.203
Pang, R. Y., Parrish, A., Joshi, N. et al. (2022). QuALITY: Question answering with long input texts, yes! In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 5336–5358. https://doi.org/10.18653/v1/2022.naacl-main.391
Ponzanelli, L., Mocci, A., Bacchelli, A., et al. (2014). Understanding and classifying the quality of technical forum questions. In: 2014 14th International Conference on Quality Software, pp 343–352. https://doi.org/10.1109/QSIC.2014.27
Rao, S., Daumé III, H. (2018). Learning to ask good questions: Ranking clarification questions using neural expected value of perfect information. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers). https://doi.org/10.18653/v1/P18-1255
Ravi, S., Pang, B., Rastogi, V. et al. (2014). Great question! question quality in community q &a. In: Proceedings of the International Conference on Web and Social Media, pp 426–435. https://doi.org/10.1609/icwsm.v8i1.14529
Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. https://doi.org/10.1016/0377-0427(87)90125-7
Tausczik, Y. R., Pennebaker, J. W. (2011). Predicting the perceived quality of online mathematics contributions from users’ reputations. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’11, p 1885–1888. https://doi.org/10.1145/1978942.1979215
Trienes, J., Balog, K. (2019). Identifying unclear questions in community question answering websites. Advances in Information Retrieval (pp 276–289). https://doi.org/10.1007/978-3-030-15712-8_18
Xu, J., & Croft, W. B. (2000). Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems, 18(1), 79–112. https://doi.org/10.1145/333135.333138
Yan, Z., & Zhou, J. (2015). Optimal answerer ranking for new questions in community question answering. Information Processing & Management, 51(1), 163–178. https://doi.org/10.1016/j.ipm.2014.07.009
Yue, Z., Zeng, H., Kou, Z. et al. (2022). Domain adaptation for question answering via question classification. In: Proceedings of the 29th International Conference on Computational Linguistics, pp 1776–1790. https://aclanthology.org/2022.coling-1.153
Zamani, H., Dumais, S., Craswell, N. et al. (2020). Generating clarifying questions for information retrieval. In: Proceedings of The Web Conference 2020, WWW ’20, p 418–428. https://doi.org/10.1145/3366423.3380126
Funding
This research has been done under the research project QG.21.58 “Researching and developing clustering integrating constraints and deep learning algorithms” of Vietnam National University, Hanoi.
Author information
Authors and Affiliations
Contributions
Alireza Khabbazan designed the model settings, collected the data, conceived the experiments, analyzed the results, and prepared the original draft. Ahmad Ali Abin conceptualized the problem, conceived the original idea, and defined the problem. Viet-Vu Vu administrated the research and carried out the experiments. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
This study was conducted in accordance with general ethical guidelines applicable to community question-answering research. It did not involve human or animal subjects, hence specific ethical committee approval was not required.
Availability of supporting data
The datasets used during the current study are available. Additionally, the datasets generated, model settings, and training processes are available from the corresponding author upon reasonable request.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Khabbazan, A., Abin, A.A. & Vu, VV. Improving the clarity of questions in Community Question Answering networks. J Intell Inf Syst 62, 1631–1658 (2024). https://doi.org/10.1007/s10844-024-00847-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-024-00847-y