[go: up one dir, main page]

CN115248829A - Data storage method, data query method and device - Google Patents

Data storage method, data query method and device Download PDF

Info

Publication number
CN115248829A
CN115248829A CN202210613771.2A CN202210613771A CN115248829A CN 115248829 A CN115248829 A CN 115248829A CN 202210613771 A CN202210613771 A CN 202210613771A CN 115248829 A CN115248829 A CN 115248829A
Authority
CN
China
Prior art keywords
data
query
routing table
database
databases
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210613771.2A
Other languages
Chinese (zh)
Inventor
赵英超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Tuya Information Technology Co Ltd
Original Assignee
Hangzhou Tuya Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Tuya Information Technology Co Ltd filed Critical Hangzhou Tuya Information Technology Co Ltd
Priority to CN202210613771.2A priority Critical patent/CN115248829A/en
Publication of CN115248829A publication Critical patent/CN115248829A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data storage method, a data query method and a data query device. The data storage method is applied to a distributed data management system comprising a plurality of databases, and comprises the following steps: acquiring data to be stored; storing at least part of contents in data to be stored into physical tables of each database based on a data routing table of a distributed data management system; the data routing table records mapping relations between the logical tables and physical tables of the plurality of databases. The method and the system can enable the service party to avoid maintaining the sub-database table and the secondary index mechanism.

Description

Data storage method, data query method and device
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data storage method, a data query method, and an apparatus.
Background
With the increasing digital service requirements, complex and diverse scenes and increasing data volume, the traditional database is under increasing pressure, in the field of big data services, the novel database can support massive data query and analysis, but the database with relatively mature concurrency performance (such as Oracle, RDS, mySQL and the like) is poor, the affairs are relatively weak, and the service requirements have to be supported by high concurrency, high throughput, low delay and affair level; in some scenes, the traditional database cannot meet high-level query, and needs to rely on more mature elastic search, solr, clickHouse, dorisDB, HBase and other multi-source analysis type database supports.
The traditional technical scheme provides a multi-instance, sub-base and sub-table scheme to support services and realize tenant data isolation.
However, in the conventional technical scheme, a business party needs to maintain a database sub-table and secondary index mechanism, unified joint query of multi-instance database sub-table data cannot be realized, simple query can be realized only according to a data distribution rule, and development and maintenance cost is quite high.
Disclosure of Invention
The application provides a data storage method, a data query method and a data query device, which do not need a business party to maintain a sub-database sub-table and a secondary index mechanism, and can realize unified combined query of multi-instance, sub-database sub-table data through a data query method.
In order to achieve the above object, the present application provides a data storage method, which is applied to a distributed data management system, where the distributed data management system includes a plurality of databases, and the method includes:
acquiring data to be stored;
storing at least part of contents in data to be stored into physical tables of each database based on a data routing table of a distributed data management system;
the data routing table records mapping relations between the logical tables and physical tables of the plurality of databases.
At least part of contents in data to be stored are stored in a physical table of each database, and the method comprises the following steps:
and if the condition that the data to be stored is not written into at least one database is existed, rolling back the data to be stored in the database which is written successfully.
The method for storing at least part of contents in data to be stored into physical tables of each database based on a data routing table of a distributed data management system comprises the following steps:
and storing at least part of contents in the data to be stored into a physical table of each database according to the data storage strategy and the data routing table.
In order to achieve the above object, the present application further provides a data query method, which is applied to a distributed data management system, where the distributed data management system includes a plurality of databases, and the method includes:
acquiring a query statement;
selecting at least one database as a query data source based on the query statement and the data routing table;
inquiring based on data in the inquiry data source to obtain an inquiry result corresponding to the inquiry statement;
the data routing table records mapping relations between the logical tables and physical tables of the plurality of databases.
Wherein selecting at least one database as a query data source based on the query statement and the data routing table comprises:
analyzing the query statement to generate a logic plan;
determining a query data range of the query statement based on the logical table and the logical plan;
and determining at least one database as a query data source corresponding to the physical execution plan based on the query data range and the data routing table.
Wherein,
determining at least one database as a query data source corresponding to the physical execution plan based on the query data range and the data routing table, including:
determining at least one feasible solution based on the query data range, the data routing table and the storage strategy;
determining an execution cost of each feasible scheme based on the data routing table;
and taking the database corresponding to the feasible scheme with the minimum execution cost as a query data source.
Determining an execution cost of each feasible scheme based on the data routing table comprises the following steps:
determining an execution cost for each feasible solution based on the data routing table, the storage policy, the data source information, the data metadata information, the cluster configuration, the actuator condition, and/or the data caching policy.
Wherein, inquiring based on the data in the inquiring data source comprises:
and if at least part of the data in the query data source is loaded to the cache layer, querying at least part of the data in the query data source in the cache layer.
To achieve the above object, the present application also provides a data management system, including:
the distributed engine layer is used for acquiring query statements and selecting at least one database as a query data source based on the query statements and the data routing table; the data routing table records mapping relations between the logical tables and physical tables of the databases;
and the execution layer is used for carrying out query based on the data in the query data source to obtain a query result corresponding to the query statement.
To achieve the above object, the present application also provides an electronic device, which includes a processor; the processor is used for executing instructions to realize the method.
To achieve the above object, the present application also provides a computer-readable storage medium for storing instructions/program data that can be executed to implement the above method.
Under the condition that the distributed data management system has a plurality of databases, the method can uniformly store at least part of contents of data to be stored into all the databases by utilizing the mapping relation between the data routing table record logic table and the physical tables of the plurality of databases so as to improve the data storage efficiency; compared with the existing scheme that a plurality of databases have a plurality of logical tables, the distributed data management system only has one logical table, and the distributed data management system can communicate all the databases through the logical table and the data routing table, so that only one data can be written during data storage, and at least part of the content of the data is uniformly stored into the physical tables of all the databases according to the logical table and the data routing table, thereby avoiding the problem that the data of the total amount of the databases needs to be written (namely, a plurality of data needs to be written) in the existing scheme that a plurality of databases have a plurality of logical tables, and the plurality of data needs to be sequentially and independently stored into the corresponding databases, and facilitating the management of the databases.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic diagram of an embodiment of a distributed data management system;
FIG. 2 is a schematic flow chart diagram illustrating an embodiment of a data storage method of the present application;
FIG. 3 is a schematic flow chart diagram illustrating another embodiment of a data query method according to the present application;
FIG. 4 is a schematic structural diagram of an embodiment of an electronic device of the present application;
FIG. 5 is a schematic structural diagram of an embodiment of a computer-readable storage medium according to the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. Additionally, the term "or" as used herein refers to a non-exclusive "or" (i.e., "and/or") unless otherwise indicated (e.g., "or otherwise" or in the alternative). Moreover, the various embodiments described herein are not necessarily mutually exclusive, as some embodiments may be combined with one or more other embodiments to form new embodiments.
The application provides a distributed data management system. As shown in FIG. 1, the distributed data management system may include a plurality of databases, such as the plurality of databases in the storage tier of FIG. 1. Specifically, the distributed data management system may include databases of different business scenarios, including, for example, relational databases (e.g., oracle, mySQL, SQL Server, etc.), retrievable databases (e.g., elasticsearch, solr, etc.), analytic databases (e.g., clickhause, tiDB, dorisDB, etc.), detailed databases (e.g., HBase, cassandra, mongoDB, etc.), time-series databases (e.g., infixdb, openTSDB, victoria metrics, etc.), graph analysis databases (e.g., neo4j, janusGraph, hugegrah, etc.), internet of things type databases (tdiob, TDengine, hugeGraph, etc.).
In order to improve the efficiency of data storage by using a plurality of databases, the distributed data management system of the application may include a distributed transaction layer and a routing layer.
The distributed transaction layer can be in communication connection with the routing layer to acquire a data routing table managed by the routing layer, wherein the data routing table is loaded with mapping relations between a logical table and physical tables of a plurality of databases, and the logical table is provided with a full field; therefore, the distributed transaction layer can store at least part of contents in data to be stored into physical tables of all databases based on the acquired data routing table, the efficiency of data storage can be improved through the distributed transaction layer, and the distributed data management system opens a plurality of databases through the distributed transaction layer and the routing layer, so that a business party can open a box to use cross-source multi-domain multi-instance database-based and table-based data storage service without concerning the specific implementation of transactions, and the data storage capacity of the distributed data management system is more efficient and flexible, and the performance is more efficient. In addition, the service side can flexibly select whether to start the global transaction according to the data storage and the service requirements, and can flexibly select the transaction level.
The routing layer is used for managing a data routing table.
Additionally, a routing layer may be used to manage executor routing tables and/or a redistributed cache, etc.
In addition, the distributed data management system can also comprise a distributed engine layer, wherein the distributed engine layer can be in communication connection with the routing layer and can be used for performing intelligent analysis on the query statement, and can determine a query data source corresponding to a final physical execution plan of the query statement based on the analysis condition of the query statement and a data routing table provided by the routing layer, so that the efficiency of applying a plurality of databases for data query is improved through the data routing table, and a large amount of cost is not required to be consumed for associating the plurality of databases. And the distributed engine layer may obtain the address of the executor corresponding to the physical execution plan based on the executor routing table managed by the router and issue the physical execution plan to the corresponding executor.
Specifically, the distributed engine layer can be further used for intelligently analyzing uniform SQL (query statements), syntax analysis, logic optimization, intelligent sensing data storage and data distribution, optimizing a logic plan according to mechanisms such as data source management, data metadata information, execution strategies, cluster configuration, the number of actuators, data caching strategies and the like, finally converting the logic plan into a physical execution plan, and issuing a plurality of distributed actuators to perform data operation through the memory level actuators and data routing information. The distributed engine layer may optimize the logical plan through strategies such as Rule-Based optimizer RBO (Rule-Based Optimization) and/or Cost-Based optimizer CBO (Cost-Based Optimization) to generate the above-described physical execution plan. The RBO determines the SQL physical execution plan according to a series of rules which are hard coded in data storage distribution, strategy configuration, cluster configuration and SQL syntax analysis. The CBO converts the relational expressions according to the optimization rules to generate a plurality of feasible schemes, then the CBO calculates the execution Cost of each feasible scheme, namely the Cost according to the statistical information (Statistics) and a Cost Model (Cost Model), and selects the feasible scheme with the lowest Cost as a final physical execution plan.
Preferably, the distributed data management system may further include a cache layer, the cache layer may be in communication connection with the executor and/or the storage layer, and the cache layer may be configured to cache data in the database, so that when the cache layer caches data corresponding to the data query, the executor may directly read the data cached in the cache layer and query the data, thereby increasing a query result return speed, and thus, data query efficiency may be improved through the cache layer. The cache layer may be a cache space constructed by local cache devices.
As shown in fig. 1, the distributed data management system may further include a management and control layer, which may be used for performing data source management, authority management, flexible policy management, data storage management, resource management, metadata management, executor management, route management, and/or operation and maintenance monitoring, and the like.
The distributed data management system can enable data streams to be subjected to diversified data storage based on intelligent strategy configuration and storage strategies through a management and control layer. Specifically, the management and control layer may be in communication connection with the routing layer, and the management and control layer may manage and control the data routing table in the routing layer.
In addition, the distributed data management system can support data management including but not limited to data storage strategy, data calculation strategy, hierarchical storage, copy management, TLL validity period management, data encryption and decryption, column level data storage management, storage schema management, metadata management, data lifecycle management, data security management, data sharing and the like through a management and control layer, and can also achieve flexible multi-source cross-domain multi-instance database-sharing and table-sharing data distributed mixed storage calculation through the management and control layer, so that data calculation performance, throughput performance, safety and stability are improved.
The distributed data management system may also include an application layer, a gateway layer, a distributed service layer, and an execution layer.
The application layer can be any one of a mobile terminal, a data analysis device and other application devices, and a user can generate a data storage request or a data query request and other requests through the application layer and complete corresponding matters based on the distributed data management system.
The gateway layer can be in communication connection with the application layer, and the gateway layer can convert and send out the request generated by the application layer in the form of different protocols. The gateway layer convertible protocols may include jdbc (Java Database Connectivity) Protocol and http (Hyper Text Transfer Protocol), and may also include other types of Database connection protocols, which are not limited herein.
The distributed server can be in communication connection with the gateway layer to obtain the request converted by the gateway layer, and when the request is a data query request, the distributed server layer can generate a query statement based on the data query request so as to generate a physical execution plan based on the query statement subsequently and further obtain a query result. The distributed service layer provides flexible and diverse data operation capability, multi-source data can be accessed through modes such as JDBC, http, file protocols, custom data connectors and the like, data encryption and decryption and data transmission safety channels can be achieved, and data can be synchronized or loaded to a unified storage layer or a cache layer in advance through a policy engine so as to accelerate data calculation.
The execution layer may correspond to an executor that may be used to perform operations such as distributed data queries. The execution layer can be in communication connection with the distributed engine layer, the storage layer, the cache layer and the like, so that the execution layer acquires the physical execution plan from the distributed engine layer and acquires corresponding data from the storage layer and/or the cache layer based on the physical execution plan to perform data operation. The executor can support cloud native automatic capacity expansion and reduction, intelligent data perception, a data immobility calculation mechanism, acceleration of data operation and the like.
In order to solve the problem that in the prior art, a business party needs to maintain a database-partitioning table-partitioning and secondary index mechanism per se and cannot realize unified combined query of multi-instance database-partitioning table-partitioning data, the application provides a data storage method and a data query method applied to the distributed data management system, and the data storage method or the data query method realizes unified operation of data cross-source multi-domain multi-instance database-partitioning table-partitioning.
Specifically, as shown in fig. 2, an embodiment of the data storage method of the present application includes the following steps. The data storage method is applied to a distributed data management system which can comprise a plurality of databases. It should be noted that the following step numbers are only used for simplifying the description, and are not intended to limit the execution order of the steps, and the execution order of the steps in the present embodiment may be arbitrarily changed without departing from the technical idea of the present application.
S101: and acquiring data to be stored.
The data to be stored may be obtained first, so that at least part of the content of the data to be stored may be stored in the physical tables of the databases based on the mapping relationship between the logical table in the data routing table of the distributed data management system and the physical tables of the databases.
S102: and storing at least part of contents in the data to be stored into a physical table of each database based on a data routing table of the distributed data management system.
After the data to be stored is obtained, at least part of contents in the data to be stored can be stored in the physical tables of all the databases based on the mapping relation between the logical table in the data routing table of the distributed data management system and the physical tables of the databases, so that the related contents of the data to be stored can be uniformly added into all the databases of the distributed transaction management system based on the data routing table, a business side does not need to maintain all the databases, and the contents of the data to be stored can be determined to be stored in the corresponding databases based on the data routing table, thereby avoiding missing or excess transmission of the contents and facilitating the management of all the databases.
The logical table records a full field, and the data routing table records a mapping relation between the logical table and physical tables of the databases, so that the logical table and the data routing table can know which contents of data to be stored need to be stored in the corresponding databases.
Further, the data routing table may also record information such as the index status of fields in the physical table mapped by the logical table (i.e. which fields in the physical table are indexed and which fields are not indexed), authority or type of the physical table (e.g. analytic type, retrieval type or detail type).
Preferably, in step S102, at least part of the contents of the data to be stored may be accurately stored in the physical tables of the respective databases based on the data routing table and the data storage policy. The data storage strategy can be used for managing the data distribution condition of different fields, so that the fields contained in the physical table of each database can be determined according to the data storage strategy and the data routing table, and the contents of the fields of the data to be stored can be accurately determined to be stored in the corresponding database based on the data routing table and the data storage strategy.
Preferably, in order to ensure the data storage consistency of different databases, that is, corresponding contents of each piece of data are stored in different databases, when the distributed data management system writes corresponding contents of data to be stored into each database according to the data routing table, if there is a case that writing of data to be stored in at least one database fails, the data to be stored in the database that is written successfully is rolled back, so that corresponding contents of data to be stored are not stored in all databases under the case that the data to be stored is not written into all databases, and thus, the data storage consistency of all databases in the distributed data management system is ensured. Further, after rolling back the data to be stored in the database which is successfully written, the distributed data management system may perform the step of writing the corresponding content of the data to be stored into each database according to the data routing table again, so as to write the data to be stored into all the databases.
In summary, under the condition that the distributed data management system has a plurality of databases, the method and the system can uniformly store at least part of contents of data to be stored into all the databases by using the mapping relation between the data routing table recording logic table and the physical tables of the plurality of databases so as to improve the data storage efficiency; compared with the existing scheme that a plurality of databases have a plurality of logical tables, the distributed data management system only has one logical table, and the distributed data management system can communicate all the databases through the logical table and the data routing table, so that only one data can be written during data storage, and at least part of the content of the data is uniformly stored into the physical tables of all the databases according to the logical table and the data routing table, thereby avoiding the problem that the data of the total amount of the databases needs to be written (namely, a plurality of data needs to be written) in the existing scheme that a plurality of databases have a plurality of logical tables, and the plurality of data needs to be sequentially and independently stored into the corresponding databases, and facilitating the management of the databases.
The data storage method described above may be performed by a distributed transaction layer in a distributed data management system.
In addition, the service side can flexibly select whether to use the data storage scheme according to the requirement. When the data storage scheme is not started, the data to be stored is stored in a database appointed by a service party. When the data storage scheme is started, the data stored in the database designated by the service party in the period of not starting the data storage scheme is synchronized to other databases of the distributed data management system, so that the data storage consistency of all databases of the distributed data management system is ensured.
Specifically, as shown in fig. 3, an embodiment of the data query method of the present application includes the following steps. The data query method is applied to a distributed data management system which can comprise a plurality of databases. It should be noted that the following step numbers are only used for simplifying the description, and are not intended to limit the execution order of the steps, and the execution order of the steps in the present embodiment may be arbitrarily changed without departing from the technical idea of the present application.
S201: a query statement is obtained.
The query statement may be obtained first, so that at least one database is selected as a query data source based on the query statement and the data routing layer in the following, and further, the query is performed based on data in the query data source in the following to obtain a query result corresponding to the query statement.
In the distributed data management system of the present application, in step S201, a query statement may be obtained by a distributed engine layer.
The query statement may be generated first before the query statement is obtained. The step of generating the query statement may specifically include: acquiring a natural query statement of a user; the natural query statement is converted into a standard query statement (e.g., an SQL statement).
In a distributed data management system, standard query statements may be generated by an application layer, a gateway layer, and a distributed service layer. Specifically, the natural language sentence issued by the user may be obtained by the application layer, and then the gateway layer sends the natural language sentence obtained by the application layer to the distributed service layer, and then the natural language sentence is converted into a standard query language sentence (for example, an SQL language sentence) by the distributed service layer.
S202: at least one database is selected as a query data source based on the query statement and the data routing table.
After the query statement is obtained, at least one database can be selected as a query data source based on the query statement and the data routing table, so that query can be performed based on data in the query data source subsequently to obtain a query result of the query statement.
The data routing table can record mapping relations between the logical tables and the physical tables of all the databases, the logical tables contain a full number of fields, the overall distribution situation of data can be determined based on the data routing table, and in step S202, a better physical execution plan can be determined based on query statements and the data routing table, so that the problem that query cost is too high due to the fact that traversing of each database or manual association of a plurality of databases is needed in the existing scheme that the databases have several logical tables can be avoided, namely the better physical execution plan can be simply determined through the data routing table, and query cost is reduced.
And step S202 may include: analyzing the query statement to generate a logic plan; and optimizing the logic plan based on the data routing table to obtain a physical execution plan. The logic plan is "what to do", and may be specifically understood as a plurality of sub-targets into which the query statement is decomposed. And the physical execution plan is a concrete "what to do", i.e. the physical execution plan is a concrete implementation of a plurality of sub-targets.
Specifically, the steps of analyzing the query statement and generating the logic plan mainly include: the distributed engine layer makes syntactic analysis on the query statement; after the Syntax parsing is successful, the query statement is converted into a structured Abstract Syntax Tree (AST), which can be used as a generated logical plan. Furthermore, after the structured abstract syntax tree is obtained, semantic analysis can be performed on the abstract syntax tree to determine whether a data source of a leaf node in the abstract syntax tree exists or not, and whether SQL calculation operation of non-leaf nodes of the abstract syntax tree conforms to logic or not can be determined, so that the semantically analyzed abstract syntax tree is obtained.
The step of optimizing the logic plan based on the data routing table to obtain the physical execution plan may specifically include: optimizing the logic plan by a Rule Based Optimizer (RBO) to obtain an optimized logic plan; and then determining a query data source of the physical execution plan corresponding to the query statement based on the data routing table and the optimized logic plan.
When determining the physical execution plan based on the data routing table, determining a query data range based on the logic plan/optimized logic plan and the logic table, that is, determining which fields, columns and/or rows need to be searched; a determination is then made as to which databases to query from, i.e., a physical execution plan, based on the data routing tables and the query data ranges.
And determining the physical execution plan based on the data routing table and the query data range may include: determining at least one feasible solution based on the query data range and the data routing table; determining an execution cost of each feasible scheme based on the data routing table; and taking the feasible scheme with the minimum execution cost as a physical execution plan, and taking a database corresponding to the physical execution plan as a query data source so as to determine the physical execution plan with superior performance and improve the query efficiency.
Specifically, it is known in which databases the data of the query data range is stored based on the data routing table, and the information in which databases the data of the query data range is stored based on the query data range can obtain at least one database combination providing all the data in the query data range, and a feasible solution can be generated based on each database combination, so that at least one feasible solution can be generated based on the data routing table.
Preferably, at least one of the feasible solutions may be determined based on the data routing table and the data storage policy. Specifically, based on the data routing table and the data storage policy, it is known in which databases and physical table-specific storage information (e.g., which fields, storage manners, and/or storage amounts, etc.) of the databases the data of the query data range is stored, and in combination with the database in which the data of the query data range is stored and the physical table-specific storage information of the databases, at least one database combination providing all the data in the query data range can be easily derived, so that at least one feasible solution can be easily generated based on the data routing table and the data storage policy.
When the execution cost of each feasible scheme is determined based on the data routing table, the execution cost of each feasible scheme can be determined based on the information such as the index condition of the field in the physical table corresponding to each feasible scheme recorded in the data routing table and/or the type of the physical table, that is, the execution cost of each feasible scheme can be conveniently calculated by the distributed engine layer through the data routing table, so that the problem that the query cost is overlarge due to the fact that traversal needs to be performed on each database or manual association needs to be performed on a plurality of databases in the existing scheme that several databases have several logical tables is solved.
Further, when determining the execution cost of each feasible solution, the information such as the storage mode and/or the data size of the physical table corresponding to each feasible solution determined by the storage policy may also be referred to.
In addition, the execution cost of each feasible scheme can be comprehensively determined based on data source management, data metadata information, execution strategies, cluster configuration, actuator conditions, data caching strategies and a data routing table, so that the execution cost of each feasible scheme can be accurately determined, a physical execution plan with optimal performance can be accurately determined, and the data query efficiency can be improved. The data source management is information such as address information and storage capacity of each database. The data metadata information is used for recording information such as which fields, authority and encryption or not are in each table (including a logical table and each physical table). And the execution strategy comprises a storage strategy, a data cleaning strategy, a periodic archiving strategy, an optimization strategy and the like. The cluster configuration may include storage conditions and configuration conditions (including memory configuration and cpu configuration) of each database. The executor condition may be information such as the number of idle executors and the task that the executor is executing.
Also, the execution cost for each feasible execution plan may be determined by a cost-based optimizer.
S203: and querying based on the data in the query data source to obtain a query result corresponding to the query statement.
After determining the query data source corresponding to the physical execution plan of the query statement based on the above steps, the query can be performed based on the data in the query data source to obtain the query result corresponding to the query statement.
Specifically, after determining the query data source corresponding to the physical execution plan, the physical execution plan may be allocated to the at least one executor, so that the at least one executor queries based on data of the database corresponding to the allocated plan, and the at least one executor obtains a query result of the allocated plan to obtain a query result corresponding to the query statement. Specifically, in the case where the physical execution plan is allocated to a plurality of executors, the query results of all the executors may be integrated to obtain the query result of the query statement.
In addition, if the data in at least part of the database (i.e., at least part of the query data sources) corresponding to the physical execution plan is loaded to the cache layer, the executor corresponding to the at least part of the database can query in the cache layer to increase the query speed, so that the distributed data management system returns the query result corresponding to the query statement faster.
In general, a query data source corresponding to a physical execution plan of a query statement can be determined based on the mapping relation between a logic table in a data routing table and physical tables of all databases, so that a proper database can be automatically identified and selected as the query data source based on the data routing table and the query statement, source cross-domain multi-instance database sub-table data unified distributed operation calculation is opened, and multi-source cross-domain multi-instance database sub-table operation of a business party to the data is shielded; and based on the logic table, automatic plan tuning can be realized, manual selection and table building and physical execution plan building from a business layer are avoided, and the execution cost of each feasible scheme is determined through the data routing table to determine the physical execution plan of the query statement, so that the problem that the physical tables of each database need to be traversed and executed to determine the cost of each execution plan in the existing scheme that several databases have several logic tables is avoided, and the best, most efficient, most stable and most intelligent data service can be realized by using the data query method.
Based on the richness of the database, the method and the system can be applied to the scenes such as a relation operation class, a multi-dimensional retrieval service class, a multi-dimensional analysis class, a mass detail service class, a time sequence service class, a graph data analysis service class and an internet of things service analysis class.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an embodiment of an electronic device 20 according to the present application. The electronic device 20 of the present application includes a processor 22, and the processor 22 is configured to execute instructions to implement the method of any of the above embodiments of the present application and any non-conflicting combinations thereof.
The electronic device 20 may be a camera or a server, and is not limited herein.
The processor 22 may also be referred to as a CPU (Central Processing Unit). Processor 22 may be an integrated circuit chip having signal processing capabilities. The processor 22 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 22 may be any conventional processor or the like.
The electronic device 20 may further include a memory 21 for storing instructions and data required for the operation of the processor 22.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present disclosure. The computer-readable storage medium 30 of the embodiments of the present application stores instructions and/or program data 31 that when executed perform the methods provided by any of the above-described embodiments of the methods of the present application, as well as any non-conflicting combinations. The instructions/program data 31 may form a program file stored in the storage medium 30 in the form of a software product, so as to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods according to the embodiments of the present application. And the aforementioned storage medium 30 includes: a U-disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or a computer, a server, a mobile phone, a tablet, or other devices capable of storing program codes.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The above embodiments are merely examples and are not intended to limit the scope of the present disclosure, and all modifications, equivalents, and flow charts using the contents of the specification and drawings of the present disclosure or those directly or indirectly applied to other related technical fields are intended to be included in the scope of the present disclosure.

Claims (11)

1. A data storage method applied to a distributed data management system including a plurality of databases, the method comprising:
acquiring data to be stored;
storing at least part of contents in the data to be stored into physical tables of each database based on a data routing table of the distributed data management system;
wherein the data routing table records mapping relations between the logical tables and the physical tables of the plurality of databases.
2. The data storage method according to claim 1, wherein the storing at least part of the contents of the data to be stored in the physical table of each database comprises:
and if the condition that the data to be stored is written into at least one database fails, rolling back the data to be stored in the database which is written successfully.
3. The data storage method according to claim 1, wherein the storing at least part of the contents of the data to be stored in the physical tables of the databases based on the data routing table of the distributed data management system comprises:
and storing at least part of contents in the data to be stored into a physical table of each database according to a data storage strategy and the data routing table.
4. A data query method, applied to a distributed data management system, the distributed data management system including a plurality of databases, the method comprising:
acquiring a query statement;
selecting at least one database as a query data source based on the query statement and a data routing table;
querying based on the data in the query data source to obtain a query result corresponding to the query statement;
wherein the data routing table records mapping relations between the logical tables and the physical tables of the plurality of databases.
5. The data query method of claim 4, wherein selecting at least one database as a query data source based on the query statement and a data routing table comprises:
analyzing the query statement to generate a logic plan;
determining a query data range for the query statement based on the logical table and the logical plan;
and determining at least one database as a query data source corresponding to the physical execution plan based on the query data range and the data routing table.
6. The data query method of claim 5,
the determining at least one database as a query data source corresponding to a physical execution plan based on the query data range and the data routing table includes:
determining at least one feasible solution based on the query data range and the data routing table;
determining an execution cost for each feasible solution based on the data routing table;
and taking the database corresponding to the feasible scheme with the minimum execution cost as the query data source.
7. The data query method of claim 6, wherein determining the execution cost for each feasible solution based on the data routing table comprises:
determining an execution cost for each feasible solution based on the data routing table, storage policy, data source information, data metadata information, cluster configuration, actuator conditions, and/or data caching policy.
8. The data query method of claim 4, wherein the querying based on the data in the query data source comprises:
and if at least part of the data in the query data source is loaded to the cache layer, querying the data in at least part of the query data source in the cache layer.
9. A data management system, characterized in that the data management system comprises:
a plurality of databases;
the distributed engine layer is used for acquiring query statements and selecting at least one database as a query data source based on the query statements and the data routing table; wherein, the data routing table records the mapping relation between the logical table and the physical tables of the databases;
and the execution layer is used for carrying out query based on the data in the query data source to obtain a query result corresponding to the query statement.
10. An electronic device, characterized in that the electronic device comprises a processor for executing instructions to implement the method of any of claims 1-8.
11. A computer-readable storage medium, characterized in that the computer-readable storage medium stores instructions and/or program data, which when executed, implement the method of any one of claims 1-8.
CN202210613771.2A 2022-05-31 2022-05-31 Data storage method, data query method and device Pending CN115248829A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210613771.2A CN115248829A (en) 2022-05-31 2022-05-31 Data storage method, data query method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210613771.2A CN115248829A (en) 2022-05-31 2022-05-31 Data storage method, data query method and device

Publications (1)

Publication Number Publication Date
CN115248829A true CN115248829A (en) 2022-10-28

Family

ID=83698911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210613771.2A Pending CN115248829A (en) 2022-05-31 2022-05-31 Data storage method, data query method and device

Country Status (1)

Country Link
CN (1) CN115248829A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118210776A (en) * 2024-03-04 2024-06-18 汉江国家实验室 A method, device and storage medium for constructing a sound propagation loss database
CN118590536A (en) * 2024-07-31 2024-09-03 舟谱数据技术南京有限公司 A grayscale release system and control method thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118210776A (en) * 2024-03-04 2024-06-18 汉江国家实验室 A method, device and storage medium for constructing a sound propagation loss database
CN118590536A (en) * 2024-07-31 2024-09-03 舟谱数据技术南京有限公司 A grayscale release system and control method thereof

Similar Documents

Publication Publication Date Title
US11816126B2 (en) Large scale unstructured database systems
US10963428B2 (en) Multi-range and runtime pruning
CN107402988B (en) Distributed NewSQL database system and semi-structured data query method
JP6617117B2 (en) Scalable analysis platform for semi-structured data
US10579634B2 (en) Apparatus and method for operating a distributed database with foreign tables
US8924373B2 (en) Query plans with parameter markers in place of object identifiers
US11321315B2 (en) Methods and systems for database optimization
US20080281784A1 (en) Query handling in databases with replicated data
US20150205834A1 (en) PROVIDING FILE METADATA QUERIES FOR FILE SYSTEMS USING RESTful APIs
CN104750681A (en) Method and device for processing mass data
JP4483034B2 (en) Heterogeneous data source integrated access method
Vajk et al. Automatic NoSQL schema development: A case study
US20180150544A1 (en) Synchronized updates across multiple database partitions
CN115248829A (en) Data storage method, data query method and device
US12174819B2 (en) Compacted table data files validation
US20080294673A1 (en) Data transfer and storage based on meta-data
D’silva et al. Secondary indexing techniques for key-value stores: Two rings to rule them all
Suganya et al. Efficient fragmentation and allocation in distributed databases
KR20190005578A (en) Systemt and method of managing distributed database based on inmemory
JP2024514672A (en) List-Based Data Retrieval Using Append-Only Data Structures
CN111782655A (en) A near-access method for database cross-regional data based on geographic partitioning
Khandelwal Queries on compressed data
CN110569310A (en) Management method of relational big data in cloud computing environment
Castillo-García et al. A review of horizontal fragmentation methods considering multimedia data and dynamic access patterns
Braun et al. MTBase: Optimizing Cross-Tenant Database Queries

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination