Abstract
At present, the vulnerability database research has mainly focused on whether the disclosed information is accurate. However, the information differences between the various vulnerability databases have received little attention.
This article proposes a WITTY (softWare versIon inconsisTency measuremenT sYstem) to detect the differences between the affected software versions of NVD and different language vulnerability databases (including English CVE, OpenWall, Chinese CNNVD, CNVD, and other eight databases). WITTY can enable Our large-scale quantitative information consistency. We introduce named entity recognition (NER) and relation extraction (RE) based on deep learning. We present custom design into named entity recognition (NER) and relation extraction (RE) based on deep learning, enabling WITTY to recognize previously invisible software names and versions based on sentence structure and context. Ground-truth shows that the system has a high accuracy rate (95.3% accuracy rate, 89.9% recall rate). We use data from 8 vulnerability databases in the past 21 years, involving 554,725 vulnerability reports. The results show that they are inconsistent. The software version is prevalent. The average exact match rate of English vulnerability databases CVE, OpenWall, and other vulnerability databases with cve is only 22.1%. The average exact match rate of Chinese CNNVD and CNVD is 49.5%, and the excat match rate of Russian vulnerability databases is 25.8%.
This work was supported by the National Key Research and Development Program of China (2018YFB0804701), the Key Research and Development Program of Hainan Province (ZDYF202012), Guangxi Key Laboratory of Cryptography and Information Security (No. GCIS202123).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Top 10 cybersecurity incidents in global government agencies. https://www.secrss.com/articles/23835. Accessed Feb 2020
Munir, R., Disso, J.P., Awan, I., Mufti, M.R.: A quantitative measure of the security risk level of enterprise networks. In: 2013 Eighth International Conference on Broadband and Wireless Computing, Communication and Applications, pp. 437–442. IEEE (2013)
Mu, D., et al.: Understanding the reproducibility of crowd-reported security vulnerabilities. In: 27th USENIX Security Symposium (USENIX Security 2018), pp. 919–936 (2018)
Nappa, A., Johnson, R., Bilge, L., Caballero, J., Dumitras, T.: The attack of the clones: a study of the impact of shared code on vulnerability patching. In: 2015 IEEE Symposium on Security and Privacy, pp. 692–708. IEEE (2015)
Dong, Y., Guo, W., Chen, Y., Xing, X., Zhang, Y., Wang, G.: Towards the detection of inconsistencies in public security vulnerability reports. In: 28th USENIX Security Symposium (USENIX Security 2019), pp. 869–885 (2019)
CVE and NVD Relationship. https://cve.mitre.org/about/cve_and_nvd_relationship.html. Accessed Feb 2020
CVE List. https://cve.mitre.org/cve/. Accessed Feb 2020
NVD data feeds. https://nvd.nist.gov/vuln/data-feeds. Accessed Feb 2020
Exploitdb. https://www.exploit-db.com/. Accessed Feb 2020
Securityfocus. https://www.securityfocus.com/vulnerabilities. Accessed Feb 2020
Openwall. http://www.openwall.com/. Accessed Feb 2020
CNNVD. https://www.cnvd.org.cn/. Accessed Feb 2020
CNVD. http://www.cnnvd.org.cn/. Accessed Feb 2020
BDU. https://bdu.fstec.ru/threat. Accessed Feb 2020
Breu, S., Premraj, R., Sillito, J., Zimmermann, T.: Information needs in bug reports: improving cooperation between developers and users. In: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, pp. 301–310 (2010)
CVE and CVE relationship. https://cve.mitre.org/about/cve_and_nvd_relationship.html. Accessed Feb 2020
Chaparro, O., et al.: Detecting missing information in bug descriptions. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pp. 396–407 (2017)
You, W., et al.: SemFuzz: semantics-based automatic generation of proof-of-concept exploits. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 2139–2154 (2017)
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)
Yang, Z., Salakhutdinov, R., Cohen, W.W.: Transfer learning for sequence tagging with hierarchical recurrent networks. arXivpreprint arXiv:1703.06345 (2017)
Are there references available for CVE entries? https://cve.mitre.org/about/faqs.html#cve_entry_references. Accessed Feb 2020
Lin, Y., Shen, S., Liu, Z., Luan, H., Sun, M.: Neural relation extraction with selective attention over instances. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2124–2133 (2016)
Zhou, P., et al.: Attention-based bidirectional long short-term memory net-works for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (volume 2: Short papers), pp. 207–212 (2016)
Giorgi, J., Wang, X., Sahar, N., Shin, W.Y., Bader, G.D., Wang, B.: End-to-end named entity recognition and relation extraction using pre-trained language models. arXiv preprint arXiv:1912.13415 (2019)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Dong, C., Zhang, J., Zong, C., Hattori, M., Di, H.: Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In: Lin, C.-Y., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds.) ICCPOL/NLPCC -2016. LNCS (LNAI), vol. 10102, pp. 239–250. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-50496-4_20
Levow, G.-A.: The third international Chinese language processing bakeoff: word segmentation and named entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 108–117 (2006)
Zhang, Y., Yang, J.: Chinese NER using lattice LSTM. arXiv preprint arXiv:1805.02023 (2018)
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labelled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 1003–1011 (2009)
CPE dictionary. https://nvd.nist.gov/products/cpe. Accessed Feb 2020
YEDDA. https://github.com/QiaoShiA/YEDDA-python3.8. Accessed Feb 2020
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. arXiv preprint arXiv:1409.3215 (2014)
Shuang, K., Zhang, Z., Loo, J., Su, S.: Convolution–deconvolution word embedding: an end-to-end multi-prototype fusion embedding method for natural language processing. Inf. Fusion 53, 112–122 (2020)
Yu, H., An, J., Yoon, J., Kim, H., Ko, Y.: Simple methods to overcome the limitations of general word representations in natural language processing tasks. Comput. Speech Lang. 59, 91–113 (2020)
Nan, Y., Yang, Z., Wang, X., Zhang, Y., Zhu, D., Yang, M.: Finding clues for your secrets: semantics-driven, learning-based privacy discovery in mobile apps. In: NDSS (2018)
Andow, B., et al.: PolicyLint: investigating internal privacy policy contradictions on google play. In: 28th USENIX Security Symposium (USENIX Security 2019), pp. 585–602 (2019)
Frigault, M., Wang, L., Singhal, A., Jajodia, S.: Measuring network security using dynamic Bayesian network. In: Proceedings of the 4th ACM Workshop on Quality of Protection, pp. 23–30 (2008)
Khosravi-Farmad, M., Rezaee, R., Harati, A., Bafghi, A.G.: Network security risk mitigation using Bayesian decision networks. In: 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE), pp. 267–272. IEEE (2014)
Liao, X., Yuan, K., Wang, X., Li, Z., Xing, L., Beyah, R.: Acing the IOC game: toward automatic discovery and analysis of open-source cyber threat intelligence. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 755–766 (2016)
Zhang, S., Ou, X., Caragea, D.: Predicting cyber risks through national vulnerability database. Inf. Secur. J. Global Persp. 24(4–6), 194–206 (2015)
Allodi, L., Massacci, F.: Comparing vulnerability severity and exploits using case-control studies. ACM Trans. Inf. Syst. Secur. (TISSEC) 17(1), 1–20 (2014)
Khosravi-Farmad, M., Rezaee, R., Bafghi, A.G.: Considering temporal and environmental characteristics of vulnerabilities in network security risk assessment. In: 2014 11th International ISC Conference on Information Security and Cryptology, pp. 186–191. IEEE (2014)
Nguyen, V.H., Massacci, F.: The (un) reliability of NVD vulnerable versions data: an empirical experiment on google chrome vulnerabilities. In: Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security, pp. 493–49 (2013)
Christey, S., Martin, B.: Buying into the bias: why vulnerability statistics suck. BlackHat, Las Vegas, USA, Technical report, vol. 1 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ren, H. et al. (2022). Detecting Inconsistent Vulnerable Software Version in Security Vulnerability Reports. In: Cao, C., Zhang, Y., Hong, Y., Wang, D. (eds) Frontiers in Cyber Security. FCS 2021. Communications in Computer and Information Science, vol 1558. Springer, Singapore. https://doi.org/10.1007/978-981-19-0523-0_6
Download citation
DOI: https://doi.org/10.1007/978-981-19-0523-0_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-0522-3
Online ISBN: 978-981-19-0523-0
eBook Packages: Computer ScienceComputer Science (R0)