Abstract
With the emergence of NoSQL databases, many large applications have migrated from relational databases (RDB) due to their superior flexibility and performance. Database migration from RDB to NoSQL databases involves schema transformation and data migration, which is not straightforward. The challenge lies in that RDB stores data in normalized form, whereas NoSQL supports denormalization. To address the challenge of schema transformation, this paper proposes a model called query-based denormalization using hypergraph (QBDNH) from RDB to the NoSQL database. The model takes the inputs from existing relational tables and queries and transforms them into the denormalized NoSQL model using hypergraphs. The approach overcomes limitations like complex relationship representation and data access pattern coverage of existing graph-based denormalization techniques. The proposed model reduces the overall time, cost, and effort needed to transform the schema manually. To validate the effectiveness of QBDNH, the experiments are conducted on the TPC-H dataset, and the performance of QBDNH is compared to existing graph-based denormalization models such as TLD, CLDA, and Kuszera. The evaluation is carried out in two parts: the first part analyzed the query speedup factor, while the second part measured efficiency improvement based on query pipeline execution. The results revealed that QBDNH achieved a notable query performance improvement with speedup factors of 1.29, 1.35, and 1.40 compared to existing TLD, CLDA, and Kuszera models. Furthermore, QBDNH significantly enhanced pipeline utilization compared to TLD and Kuszera.























Similar content being viewed by others
References
Atzeni P, Jensen CS, Orsi G et al (2013) The relational model is dead, SQL is dead, and i don’t feel so good myself. SIGMOD Record 42:64–68. https://doi.org/10.1145/2503792.2503808
Stonebraker M (2010) SQL databases v NoSQL databases. Commun ACM 53:10–11. https://doi.org/10.1145/1721654.1721659
Masataka H, Yutaka W (2022) Making software based on human-driven design case study: SQL for non-experts. Proceedings—2022 IEEE 15th international symposium on embedded multicore/many-core systems-on-chip, MCSoC 2022:264–270. https://doi.org/10.1109/MCSoC57363.2022.00049
Floratou A, Teletia N, DeWitt DJ et al (2012) Can the elephants handle the NoSQL onslaught? In: Proceedings of the VLDB endowment 5:1712–1723. https://doi.org/10.14778/2367502.2367511
Cattell R (2010) Scalable SQL and NoSQL data stores. SIGMOD Record 39:12–27. https://doi.org/10.1145/1978915.1978919
Ali D, Liu C, Mengchi L (2018) A survey on NoSQL stores. ACM Comput Surv (CSUR) 51. https://doi.org/10.1145/3158661
Stonebraker M, Abadi DJ, Batkin A et al (2005) C-Store: A column-oriented DBMS. In: VLDB 2005—Proceedings of 31st international conference on very large data bases 2:553–564. https://doi.org/10.1145/3226595.3226638
Störl U, Klettke M, Scherzinger S (2020) NoSQL schema evolution and data migration: State-of-the-art and opportunities. Adv Database Technol EDBT 2020-March, pp 655–658. https://doi.org/10.5441/002/edbt.2020.87
Lee T, Chams M, Nado R et al (2001) System for detecting migration differences of a customized database schema. Google Patents 17:552–560
Wang Y, Shah R, Criswell A et al (2020) Data migration using datalog program synthesis. In: Proceedings of the VLDB endowment 13:1006–1019. https://doi.org/10.14778/3384345.3384350
Gómez P, Casallas R, Roncancio C (2016) Data schema does matter, even in NoSQL systems! In: Proceedings—international conference on research challenges in information science 2016-Augus, pp 1–6. https://doi.org/10.1109/RCIS.2016.7549340
Kaur K, Rani R (2013) Modeling and querying data in NoSQL databases. In: Proceedings—2013 IEEE international conference on big data, big data 2013, pp 1–7. https://doi.org/10.1109/BigData.2013.6691765
Kuszera EM, Peres LM, Didonet Del Fabro M (2022) Exploring data structure alternatives in the RDB to NoSQL document store conversion process. Inf Syst 105:101941. https://doi.org/10.1016/j.is.2021.101941
Karnitis G, Arnicans G (2015 ) Migration of relational database to document-oriented database: structure denormalization and data transformation. In: Proceedings—7th International Conference on Computational Intelligence, Communication Systems and Networks, CICSyN, pp 113–118. https://doi.org/10.1109/CICSYN.2015.30
Yoo J, Lee KH, Jeon YH (2018) Migration from RDBMS to NoSQL using column-level denormalization and atomic aggregates*. J Inf Sci Eng 34:243–259. https://doi.org/10.6688/JISE.2018.34.1.15
Chebotko A, Kashlev A, Lu S (2015) A big data modeling methodology for Apache Cassandra. Proceedings—2015 IEEE international congress on big data, bigdata congress 2015:238–245. https://doi.org/10.1109/BigDataCongress.2015.41
Hewasinghage M, Abelló A, Varga J, Zimányi E (2020) DocDesign: cost-based database design for document stores. In: 32nd International conference on scientific and statistical database management (SSDBM), ACM, pp 1–4. https://doi.org/10.1145/3400903.3401689
Hewasinghage M, Abelló A, Varga J, Zimányi E (2021) A cost model for random access queries in document stores. VLDB J 30:559–578. https://doi.org/10.1007/s00778-021-00660-x
Wolf MM, Klinvex AM, Dunlavy DM (2016) Advantages to modeling relational data using hypergraphs versus graphs. In: 2016 IEEE high performance extreme computing conference, HPEC 2016 0–6. https://doi.org/10.1109/HPEC.2016.7761624
TPC-H benchmark. http://www.tpc.org/tpch/
(2016) A MongoDB White Paper RDBMS to MongoDB Migration Guide (White paper). MongoDB White Paper
Whang JJ, Du R, Jung S et al (2020) MEGA: Multi-view semi-supervised clustering of hypergraphs. In: Proceedings of the VLDB endowment 13:698–711. https://doi.org/10.14778/3377369.3377378
Lee G, Ko J, Shin K (2020) Hypergraph motifs: concepts, algorithms, and discoveries. In: Proceedings of the VLDB endowment 13:2256–2269. https://doi.org/10.14778/3407790.3407823
Ghaleb FFM, Taha AA, Hazman M et al (2020) RDF-BF-Hypergraph representation for relational database. Int J Math Comput Sci 15:41–64
Hewasinghage M, Abelló A, Varga J, Zimányi E (2021) Managing polyglot systems metadata with hypergraphs. Data Knowl Eng 134:101896. https://doi.org/10.1016/j.datak.2021.101896
Mok WY, Embley DW (2006) Generating compact redundancy-free XML documents from conceptual-model hypergraphs. IEEE Trans Knowl Data Eng 18:1082–1096. https://doi.org/10.1109/TKDE.2006.125
Vera-Olivera H, Guo R, Huacarpuma RC et al (2021) Data Modeling and NoSQL Databases-A Systematic Mapping Review. ACM Comput Surv 54. https://doi.org/10.1145/3457608
Shin SK, Sanders GL (2006) Denormalization strategies for data retrieval from data warehouses. Decis Support Syst 42:267–282. https://doi.org/10.1016/j.dss.2004.12.004
Imam AA, Basri S, Ahmad R et al (2018) Automatic schema suggestion model for NoSQL document-stores databases. Journal of Big Data 5:1–17. https://doi.org/10.1186/s40537-018-0156-1
Imam AA, Basri S, Ahmad R, González-Aparicio MT (2019) Schema proposition model for NoSQL applications. Adv Intell Syst Comput 843:30–39. https://doi.org/10.1007/978-3-319-99007-1_3
Ceresnak R, Dudas A, Matiasko K, Kvet M (2021) Mapping rules for schema transformation: SQL to NoSQL and back. In: International conference on information and digital technologies 2021, IDT 2021 52–58. https://doi.org/10.1109/IDT52577.2021.9497629
Ramzan S, Bajwa IS, Ramzan B, Anwar W (2019) Intelligent data engineering for migration to NoSQL based secure environments. IEEE Access 7:69042–69057. https://doi.org/10.1109/ACCESS.2019.2916912
Serrano D, Han D, Stroulia E (2015) From relations to multi-dimensional maps: towards an SQL-to-HBase transformation methodology. In: Proceedings—2015 IEEE 8th international conference on cloud computing, CLOUD 2015 81–89. https://doi.org/10.1109/CLOUD.2015.21
Shichkina Y, Ha VM (2020) Method for creating collections with embedded documents for document-oriented databases taking into account executable queries. In: SPIIRAS proceedings 19:829–854. https://doi.org/10.15622/sp.2020.19.4.5
Li C (2010) Transforming relational database into HBase: a case study. In: Proceedings 2010 IEEE international conference on software engineering and service sciences, ICSESS 2010, pp 683–687. https://doi.org/10.1109/ICSESS.2010.5552465
Lee CH, Zheng YL (2016) SQL-To-NoSQL Schema Denormalization and Migration: A Study on Content Management Systems. Proceedings - 2015 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2015 2022–2026. https://doi.org/10.1109/SMC.2015.353
Zhao G, Lin Q, Li L, Li Z (2014) Schema conversion model of SQL database to NoSQL. In: Proceedings—2014 9th international conference on P2P, parallel, grid, cloud and internet computing, 3PGCIC 2014 355–362. https://doi.org/10.1109/3PGCIC.2014.137
Ko HKE, Lee YJK (2020) Techniques and guidelines for effective migration from RDBMS to NoSQL. J Supercomput 76:7936–7950. https://doi.org/10.1007/s11227-018-2361-2
Jia T, Zhao X, Wang Z et al (2016) Model transformation and data migration from relational database to MongoDB. In: Proceedings—2016 IEEE international congress on big data, bigdata congress, pp 60–67. https://doi.org/10.1109/BIGDATACONGRESS.2016.16
Mior MJ, Salem K, Aboulnaga A, Liu R (2017) NoSE: Schema design for NoSQL applications. IEEE Trans Knowl Data Eng 29:2275–2289. https://doi.org/10.1109/TKDE.2017.2722412
Imam AA, Basri S, Ahmad R et al (2018) Data modeling guidelines for NoSQL document-store databases. Int J Adv Comput Sci Appl 9:544–555. https://doi.org/10.14569/IJACSA.2018.091066
The Professional Client, IDE and GUI for MongoDB | Studio 3T. https://studio3t.com/. Accessed 8 Jun 2023
Fleming PJ, Wallace JJ (1986) How not to lie with statistics: The correct way to summarize benchmark results. Commun ACM 29:218–221. https://doi.org/10.1145/5666.5673
Dreseler M, Boissier M, Rabl T, Uflacker M (2020) Quantifying TPC-H choke points and their optimizations. In: Proceedings of the VLDB endowment 13:1206–1220. https://doi.org/10.14778/3389133.3389138
Henry OB (2019) MongoDB aggregation stages and pipelining. White paper, pp 1–38
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
Neha Bansal was involved in writing original draft, writing—reviewing and editing, conceptualization, methodology, programming, validation, Shelly Sachdeva helped in supervision, validation, writing—reviewing and editing, Lalit K. Awasthi contributed to supervision, validation, writing—reviewing and editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
This section shows the name of pipeline stages (PS) and total count (TC) of pipeline stages used for each TPC-H query in TLD, CLDA, Kuszera, and QBDNH model.
See Table 14.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bansal, N., Sachdeva, S. & Awasthi, L.K. Query-based denormalization using hypergraph (QBDNH): a schema transformation model for migrating relational to NoSQL databases. Knowl Inf Syst 66, 681–722 (2024). https://doi.org/10.1007/s10115-023-02017-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-023-02017-y