Using Language Models for Enhancing the Completeness of Natural-Language Requirements

Dipeeka Luitel⁹,
Shabnam Hassani⁹ &
Mehrdad Sabetzadeh⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13975))

Included in the following conference series:

International Working Conference on Requirements Engineering: Foundation for Software Quality

1243 Accesses
7 Citations

Abstract

[Context and motivation] Incompleteness in natural-language requirements is a challenging problem. [Question/problem] A common technique for detecting incompleteness in requirements is checking the requirements against external sources. With the emergence of language models such as BERT, an interesting question is whether language models are useful external sources for finding potential incompleteness in requirements. [Principal ideas/results] We mask words in requirements and have BERT’s masked language model (MLM) generate contextualized predictions for filling the masked slots. We simulate incompleteness by withholding content from requirements and measure BERT’s ability to predict terminology that is present in the withheld content but absent in the content disclosed to BERT. [Contribution] BERT can be configured to generate multiple predictions per mask. Our first contribution is to determine how many predictions per mask is an optimal trade-off between effectively discovering omissions in requirements and the level of noise in the predictions. Our second contribution is devising a machine learning-based filter that post-processes predictions made by BERT to further reduce noise. We empirically evaluate our solution over 40 requirements specifications drawn from the PURE dataset [1]. Our results indicate that: (1) predictions made by BERT are highly effective at pinpointing terminology that is missing from requirements, and (2) our filter can substantially reduce noise from the predictions, thus making BERT a more compelling aid for improving completeness in requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Improving requirements completeness: automated assistance through large language models

Article 25 March 2024

Measuring and Improving the Completeness of Natural Language Requirements

Natural Language Processing for Requirements Formalization: How to Derive New Approaches?

References

Ferrari, A., Spagnolo, G.O., Gnesi, S.: PURE: a dataset of public requirements documents. In: RE (2017)
Google Scholar
Zowghi, D., Gervasi, V.: The three Cs of requirements: consistency, completeness, and correctness. In: REFSQ (2003)
Google Scholar
Zowghi, D., Gervasi, V.: On the interplay between consistency, completeness, and correctness in requirements evolution. IST 45(14), 993–1009 (2003)
Google Scholar
Arora, C., Sabetzadeh, M., Briand, L.C.: An empirical study on the potential usefulness of domain models for completeness checking of requirements. Empir. Softw. Eng. 24(4), 2509–2539 (2019). https://doi.org/10.1007/s10664-019-09693-x
Article Google Scholar
Ferrari, A., dell’Orletta, F., Spagnolo, G.O., Gnesi, S.: Measuring and improving the completeness of natural language requirements. In: Salinesi, C., van de Weerd, I. (eds.) REFSQ 2014. LNCS, vol. 8396, pp. 23–38. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05843-6_3
Chapter Google Scholar
Dalpiaz, F., van der Schalk, I., Lucassen, G.: Pinpointing ambiguity and incompleteness in requirements engineering via information visualization and NLP. In: Kamsties, E., Horkoff, J., Dalpiaz, F. (eds.) REFSQ 2018. LNCS, vol. 10753, pp. 119–135. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77243-1_8
Chapter Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Google Scholar
Luitel, D., Hassani, S., Sabetzadeh, M.: Replication package (2023). https://doi.org/10.6084/m9.figshare.22041341
Jurafsky, D., Martin, J.: Speech and Language Processing, 2nd edn. Prentice Hall, Upper Saddle River (2009)
Google Scholar
Hey, T., Keim, J., Koziolek, A., Tichy, W.F.: NoRBERT: transfer learning for requirements classification. In: RE (2020)
Google Scholar
Ezzini, S., Abualhaija, S., Arora, C., Sabetzadeh, M.: Automated handling of anaphoric ambiguity in requirements: a multi-solution study. In: ICSE (2022)
Google Scholar
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: NAACL-HLT (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: EMNLP (2014)
Google Scholar
Witten, I.H., Frank, E., Hall, M.A.: Data Mining: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann, Boston (2017)
Google Scholar
Berry, D.M., Cleland-Huang, J., Ferrari, A., Maalej, W., Mylopoulos, J., Zowghi, D.: Panel: context-dependent evaluation of tools for NL RE tasks: recall vs. precision, and beyond. In: RE (2017)
Google Scholar
Ezzini, S., Abualhaija, S., Arora, C., Sabetzadeh, M., Briand, L.: Using domain-specific corpora for improved handling of ambiguity in requirements. In: ICSE (2021)
Google Scholar
Cui, G., Lu, Q., Li, W., Chen, Y.R.: Corpus exploitation from Wikipedia for ontology construction. In: LREC (2008)
Google Scholar
Ferrari, A., Donati, B., Gnesi, S.: Detecting domain-specific ambiguities: an NLP approach based on Wikipedia crawling and word embeddings. In: AIRE (2017)
Google Scholar
Ezzini, S., Abualhaija, S., Sabetzadeh, M.: WikiDoMiner: wikipedia domain-specific miner. In: ESEC/FSE (2022)
Google Scholar
Daniel, M., Berry, E.K., Krieger, M.: From contract drafting to software specification: linguistic sources of ambiguity, a handbook (2003)
Google Scholar
Arora, C., Sabetzadeh, M., Briand, L., Zimmer, F.: Automated checking of conformance to requirements templates using natural language processing. IEEE TSE 41(10), 944–968 (2015)
Google Scholar
Arora, C., Sabetzadeh, M., Briand, L., Zimmer, F.: Automated extraction and clustering of requirements glossary terms. IEEE TSE 43(10), 918–945 (2017)
Google Scholar
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: The WEKA Workbench: Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques", 4th edn. Morgan Kaufmann Publishers Inc., Boston (2016)
Google Scholar
Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. In: Syngress (2008)
Google Scholar
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. JMLR 13(2), 1–25 (2012)
MathSciNet MATH Google Scholar
Capon, J.A.: Elementary Statistics for the Social Sciences: Study Guide. In: Wadsworth (1991)
Google Scholar
Vargha, A., Delaney, H.: A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J. Educ. Behav. Stat. 25(2), 101–132 (2000)
Google Scholar
Bhatia, J., Breaux, T.: Semantic incompleteness in privacy policy goals. In: RE (2018)
Google Scholar
Cejas, O.A., Abualhaija, S., Torre, D., Sabetzadeh, M., Briand, L.: AI-enabled automation for completeness checking of privacy policies. IEEE TSE 48(11), 4647–4674 (2022)
Google Scholar
Shen, Y., Breaux, T.: Domain model extraction from user-authored scenarios and word embeddings. In: AIRE (2022)
Google Scholar

Download references

Acknowledgements

This work was funded by the Natural Sciences and Engineering Research Council of Canada (NSERC) under the Discovery and Discovery Accelerator programs. We are grateful to Shiva Nejati, Sallam Abualhaija and Jia Li for helpful discussions. We thank the anonymous reviewers of REFSQ 2023 for their constructive comments.

Author information

Authors and Affiliations

University of Ottawa, Ottawa, ON, K1N 6N5, Canada
Dipeeka Luitel, Shabnam Hassani & Mehrdad Sabetzadeh

Authors

Dipeeka Luitel
View author publications
You can also search for this author in PubMed Google Scholar
Shabnam Hassani
View author publications
You can also search for this author in PubMed Google Scholar
Mehrdad Sabetzadeh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Dipeeka Luitel , Shabnam Hassani or Mehrdad Sabetzadeh .

Editor information

Editors and Affiliations

CNR ISTI, Pisa, Italy
Alessio Ferrari
Chalmers Tekniska Högskola, Gothenburg, Sweden
Birgit Penzenstadler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luitel, D., Hassani, S., Sabetzadeh, M. (2023). Using Language Models for Enhancing the Completeness of Natural-Language Requirements. In: Ferrari, A., Penzenstadler, B. (eds) Requirements Engineering: Foundation for Software Quality. REFSQ 2023. Lecture Notes in Computer Science, vol 13975. Springer, Cham. https://doi.org/10.1007/978-3-031-29786-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-29786-1_7
Published: 04 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29785-4
Online ISBN: 978-3-031-29786-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using Language Models for Enhancing the Completeness of Natural-Language Requirements

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improving requirements completeness: automated assistance through large language models

Measuring and Improving the Completeness of Natural Language Requirements

Natural Language Processing for Requirements Formalization: How to Derive New Approaches?

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Using Language Models for Enhancing the Completeness of Natural-Language Requirements

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Improving requirements completeness: automated assistance through large language models

Measuring and Improving the Completeness of Natural Language Requirements

Natural Language Processing for Requirements Formalization: How to Derive New Approaches?

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation