CN112163010B

CN112163010B - Cross-data source query method and device for database

Info

Publication number: CN112163010B
Application number: CN202010868325.7A
Authority: CN
Inventors: 谭彰; 陈英杰; 陈则润; 屈泽中
Original assignee: Lanzhuo Digital Technology Co ltd
Current assignee: Lanzhuo Digital Technology Co ltd
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2024-04-12
Anticipated expiration: 2040-08-26
Also published as: CN112163010A

Abstract

The invention provides a method and a device for querying a database across data sources, wherein the method for querying the database across data sources comprises the following steps: configuring index attributes of data in each data source according to a preset definition specification, and constructing a query model comprising binding relations; acquiring a data query request of a target index, and generating sub-query tasks corresponding to all data sources according to the data query request; executing sub-query tasks, and obtaining sub-query results in each data source according to the execution results; and merging sub-query results based on the binding relation, and outputting a final query result of the target index. Through unified definition specification, the binding relation among indexes of different data sources is determined, the problem that data definition is not uniform when different data sources are queried is solved, and compared with a data warehouse with a higher construction threshold, the cost for constructing the definition specification is lower.

Description

Cross-data source query method and device for database

Technical Field

The invention belongs to the field of database query, and particularly relates to a method and a device for querying a database across data sources.

Background

In the enterprise management process, a report system needs to be constructed in aspects of business butt joint, performance statistics, manpower management and the like, a plurality of different data sources are often involved in index query of the report system, because the construction of the different data sources comes from different users, the definition of index data is not uniform, the conventional SQL technology is difficult to query across the different data sources, and a complex data warehouse system has great advantages in data processing capacity and data quality, but has high construction cost and lower cost performance for small and medium enterprises.

Disclosure of Invention

In order to solve the defects and shortcomings in the prior art, the invention provides a method and a device for querying a database across data sources, wherein the method for querying the database across data sources comprises the following steps:

configuring index attributes of data in each data source according to a preset definition specification, and constructing a query model comprising binding relations;

acquiring a data query request of a target index, and generating sub-query tasks corresponding to all data sources according to the data query request;

executing sub-query tasks, and obtaining sub-query results in each data source according to the execution results;

and merging sub-query results based on the binding relation, and outputting a final query result of the target index.

Optionally, the preset definition specification includes a scope definition specification, a unit definition specification and an attribute definition specification:

the scope definition specification is used for defining the effective application scope of the index in the database;

the unit definition specification is used for defining units of the indexes and unit conversion relations;

the attribute definition specification is used to define attributes of respective indexes, and indexes having the same meaning contain the same attribute fields.

Optionally, configuring the index attribute of the data in each data source according to a preset definition specification, and constructing a query model including a binding relationship includes:

acquiring attribute fields contained in indexes from different data sources, and merging indexes containing the same attribute fields;

constructing binding relations among indexes through join sentences;

and constructing a query model based on the index merging result and the binding relation.

Optionally, the data query request for obtaining the target index generates sub-query tasks corresponding to each data source according to the data query request, including:

obtaining a filtering condition in a data query request, wherein the filtering condition comprises a filtering index used for limiting a query data range;

analyzing the overlapping part of the index contained in the data source and the filtering index, taking the overlapping part as the filtering condition of the sub-query task corresponding to the data source, and splitting the sub-query task of each data source from the data query request based on the filtering condition of the obtained sub-query task.

Optionally, the merging the sub-query results based on the binding relationship, outputting the final query result of the target index, includes:

acquiring a binding relation between a first index and a second index;

based on the binding relation, combining the query result of the first index with the query result of the second index, and outputting the combined result as a final query result.

Optionally, the cross-data source query method further includes conflict processing for the attribute field and the binding relationship, where the conflict processing includes:

judging whether attribute fields defined for all indexes have naming conflict and whether binding relations among all indexes have relation conflict or not;

if naming conflict exists, determining whether to rename the conflicting attribute field according to the actual requirement of attribute definition, and if not, disabling the attribute field;

if the relation conflicts, the manual adjustment is carried out according to the actual condition of index binding.

The invention also provides a data source crossing query device of the database based on the same thought, and the data source crossing query device comprises:

definition unit: the query model comprises binding relations and is used for configuring index attributes of data in each data source according to preset definition specifications;

task generation unit: the data query request is used for acquiring the target index, and sub-query tasks corresponding to all the data sources are generated according to the data query request;

an execution unit: the sub-query task is used for executing the sub-query task, and sub-query results in each data source are obtained according to the execution results;

a merging unit: and the method is used for merging sub-query results based on the binding relation and outputting a final query result of the target index.

Optionally, the defining unit is specifically configured to:

acquiring attribute fields of indexes from different data sources in a definition specification, and merging indexes containing the same attribute fields;

constructing binding relations among indexes through join sentences;

Optionally, the task generating unit is specifically configured to:

Optionally, the merging unit is specifically configured to:

acquiring a binding relation between a first index and a second index;

The technical scheme provided by the invention has the beneficial effects that:

through unified definition specification, the binding relation among indexes of different data sources is determined, the problem that data definition is not uniform when different data sources are queried is solved, and compared with a data warehouse with a higher construction threshold, the cost for constructing the definition specification is lower.

Drawings

In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a method for querying a database across data sources;

FIG. 2 is a block diagram of an architecture for a database for cross-data source querying;

fig. 3 is a block diagram of a database cross-data source query device according to the present invention.

Detailed Description

In order to make the structure and advantages of the present invention more apparent, the structure of the present invention will be further described with reference to the accompanying drawings.

Example 1

As shown in fig. 1, the present invention provides a method for querying a database across data sources, which includes:

s1: and configuring index attributes of data in each data source according to a preset definition specification, and constructing a query model comprising binding relations.

The preset definition specifications comprise a scope definition specification, a unit definition specification and an attribute definition specification;

the scope definition specification is used to define the effective scope of the index in the database, including the global domain and the topic domain. The constraint range of the global domain is the whole database platform, and the indexes defined as the global domain are universal in the whole database and all represent the same meaning, for example, the order numbers in the service represent the same meaning in the database and are all effective indexes. The constraint range of the topic domain is that each data source in the database, and the index defined as the topic domain is only common inside each data source in the database, such as a market total cost A and a human total cost B, and the index A and the index B are all total costs, but A only represents the total cost of the market, is only valid for the data source from the market and is invalid for the data source from the human.

The unit definition specification is used to define units of the respective indexes and unit conversion relations including types of units, specific units, whether standard units are used, and expressions converted to standard units. The type of unit refers to the kind of unit of the index, such as a length unit, a weight unit, an area unit, and the like. If the unit of the index is not a standard unit set in advance, it is also necessary to define an expression converted to the standard unit in the unit definition specification. If the unit of purchasing M material is ton, but the unit of receiving M material is kg, and the kg is assumed to be the standard unit, therefore, for the index representing purchasing M material, a conversion expression for converting ton into kg needs to be added in the unit definition specification so as to be converted into a unified unit when the index is calculated.

The attribute definition specification is used for defining the attribute of each index, and the indexes with the same meaning contain the same attribute fields, including the scope of the attribute, the attribute name, the Chinese meaning of the attribute, the data type corresponding to the attribute, the unit type and the unit name. Wherein the scope of the attribute is obtained from the scope definition specification, and the unit type and the unit name of the attribute are obtained from the unit definition specification.

In this embodiment, the index definition compiler in the reporting system performs definition configuration on the index according to the definition specification described above. Configuring index attributes of data in each data source according to a preset definition specification, and constructing a query model comprising binding relations, wherein the query model comprises:

the attribute fields of the indexes from different data sources are acquired, and the indexes containing the same attribute fields are combined, namely, the combined indexes have the same meaning as the indexes, although the names of the indexes are different because the indexes come from different data sources, so the indexes are combined into the same index.

Binding relations among the combined indexes are constructed through join sentences, and if the indexes with different attributes have associated requirements, the binding relations among the indexes can be constructed through command operations of left/right/inner/outer join, foreign keys and primary keys, so that a query model is obtained. For example, the job number of an employee and the department of the employee belong to different attributes, but in actual use situations, the employee and the corresponding department of the employee are often required to be queried simultaneously, so that a binding relationship is established between the job number and the department of the employee by using join sentences, and subsequent query is facilitated.

The cross-data source query method also comprises conflict processing on the attribute fields and the binding relations. When configuring the attribute of each index, the index definition compiler automatically checks the defined attribute field to determine whether the attribute field defined for each index has naming conflict with the entity attribute. If naming conflicts exist, the index definition compiler gives a prompt and reminds a configurator to select a conflict resolution strategy according to the actual requirement of attribute definition, whether to rename the conflicting attribute fields is determined, and if not, the attribute fields are deactivated. At the same time, all conflicting attributes require the addition of an alias to distinguish between the data source name or the query model name in which they reside.

And obtaining a query model of the database through the index combination and the binding relation. The query model is a logical mapping to the physical data source, and as can be known from the above description, the query model is used for representing the relationship between the indexes, and can bind the mutual tools of various heterogeneous systems, such as a relational database, a file, an API and the like. A query model is composed of data sources, relationship definitions, and conflict definitions, which may be physical data sources such as relational databases, files, etc., as well as other query models. The relation definition is the binding relation determined in the definition specification, and indexes of different attributes can be bound with each other, and a plurality of data sources or query models can be bound with each other. The conflict definition is the processing result of the conflict of the attribute fields in the definition specification.

The query model is built based on the unified definition specification, the data of various heterogeneous systems are linked, the binding relation in the query model can be directly called when the subsequent cross-data source query is convenient, and the problem of non-uniform data definition from different data sources is solved.

For example, existing metrics from the following data sources are known:

relational library a [ users table ]: id, name, age, dept_id

Relational library B [ user_info Table ] user_id, level [ post level ], reduction, age [ follow-up Table redundancy ]

Each row of the file C [ Login.log file ] can parse out user_id, login_time, logo_time, level [ log level ]

The content of the definition specification is first determined as shown in table 1.

TABLE 1

Attribute names	Attribute Chinese name	Data type
			user_id	User id	int
age	Age of user	int

Combining the indexes according to the attributes defined in the table 1:

and merging the index id of the relation library A, the index user_id of the relation library B and the index user.id of the file C into an index with the attribute of the user_id, and merging the index age of the relation library A and the index work_age of the relation library B into an index with the attribute of the age in the same way.

And defining the relationship between the merged data sources:

C left join B on user_id

C left join A on user_id

through the command statement, in the data of the data attribute user_id, the data belonging to the same user id in the relation library A, the relation library B and the file C are bound together, for example, staff 1 in the relation library A and working time login_time=9 in the file C correspond to user id=1 in the relation library A, the relation library B and the file C respectively, namely, the binding relation between staff with id 1 and working time is established. After conflict processing, a query model about sign-in condition is constructed, and the name of the query model is log_model.

S2: and acquiring a data query request of the target index, and generating sub-query tasks corresponding to the data sources according to the data query request.

The target index is the result which is wanted to be inquired when the cross data source is inquired, and the data related to the target index in the database is screened out through the filtering condition in the data inquiry request. First, filter conditions in a data query request are obtained, the filter conditions including filter metrics for defining a range of query data, i.e., data that are involved in calculating the metrics for the purpose. And then analyzing the overlapping part of the index contained in the data source and the filtering index, taking the overlapping part as the filtering condition of the sub-query task corresponding to the data source, and splitting the sub-query task of each data source from the data query request based on the filtering condition of the obtained sub-query task. That is, the filtering indexes belonging to each specific data source are split from all the filtering indexes, and relevant data are screened in each data source through the split filtering indexes. In this embodiment, the judging condition included in the where statement is set as the relation between each filtering indicator, that is, the style of the where statement is made to be the style of the filtering indicator x and the filtering indicator y and the filtering indicator z, and it is ensured that the filtering indicators included in the where statement of each subtask have matched data in the corresponding data source.

For example, in the log of check-in with the level of 1, the result of average time to work of staff who is learned as a family in each department is queried, and as can be seen from the above description, the average time to work of staff who is learned as a family in each department is the target index. The data query request for obtaining the target index is select avg (logo_time-logo_time) from log_ model where education = 'Gramineae' and level= 1group by dept_id, where education, level =1, depth_id, logo_time, and logo_time are filtering indexes. From the above examples, dept_id is an index in the relational library a [ users table ], reduction and level=1 are indexes in the relational library B [ users_info table ], and logo_time are indexes in the file C [ log file ]. The range of the above-mentioned query is the query model log_model.

Therefore, the command sentences corresponding to the sub-tasks after splitting are respectively:

for relational library A

select id as user_id,dept_id from users

For relational library B

select user_id, level= 1as user_id from user_infos where education = 'family'

For File C

select user_id,logout_time-login_time as tmp_attr1 from login.log where level＝1

S3: and executing the sub-query task, and obtaining sub-query results in each data source according to the execution results.

In the embodiment, for a relational database, the relational database is directly converted into SQL for inquiring, and sub-inquiring results are obtained according to the inquiring results of all the sub-tasks; for a non-relational database, such as a file, after being read row by row, a corresponding query program screens out a corresponding sub-query result according to a filtering condition in a sub-query task, and a person skilled in the art should know how to set the query program, which is not described here again.

S4: and merging sub-query results based on the binding relation, and outputting a final query result of the target index.

Acquiring a binding relation between a first index and a second index;

For example, according to the subtask split result in S2, a result merge command is generated based on the query model log_model:

select avg(tmp_attrs),dept_id from login.log inner join user_info left_join users group by dept_id

the tmp_attrs is the query result of the subtask of the file C split in S2, i.e., tmp_attrs=logo_time-login_time.

And merging the employee shift time tmp_attrs inquired in the file C with employee identity information user_id in the relation library A and the relation library B in a one-to-one correspondence manner through inner join and left_join sentences. The working time of the staff is the first index, and the staff identity information is the second index. The data containing the same field with the file C in the relation library A and the relation library B are bound together in the previously constructed query model log_model, each queried employee and the corresponding working time thereof are combined together based on the binding relation between the working time of the employee and the identity information of the employee in the log_model, and the identity information of the same employee in different naming forms in the relation library A, the relation library B and the file C are combined together based on the binding relation obtained in the definition unit 51.

The following describes the process of the above-described cross-data-source query method in connection with a specific example, which is implemented by the database architecture shown in fig. 2, where the database architecture includes an interactive interface, an index definition compiler, a task manager, an index database, a schedule database, and data sources #1, #2, and #3. The index definition compiler is used for performing index definition and solidification configuration according to a preset definition rule system. The definition rule system comprises a scope, a unit main rule and an attribute main rule, and is used for defining a specification. And outputting the compiling result of the index definition compiler to a task manager, and calling a scheduling database and information in the index database by the task manager when cross-data source query is performed, wherein the scheduling database is used for inquiring scheduling configuration of a task, and cross-data source query is performed among the data source #1, the data source #2 and the data source #3 according to the scheduling configuration result.

The example specifically includes:

step one: the requester initiates a data query request and inputs a query command.

Step two: the database distributes the computing nodes to the data query requests through the task manager, and the requesters keep the progress update synchronous with the computing nodes.

Step three: and the computing node compiles the query command through the index definition compiler, and generates a planner program module according to the compiling result.

Step four: splitting the data query request into subtasks corresponding to the data source #1, the data source #2 and the data source #3 respectively by the planner program module, calling query resources in the scheduling database by the task manager, and executing the subtasks.

Step five: and merging results of all the data source subtasks by the planner program module, and returning the merged results to the requester.

Example two

As shown in fig. 3, the present invention proposes a database cross-data source query device 5, including:

definition unit 51: and the query model comprising binding relations is constructed by configuring index attributes of data in each data source according to preset definition specifications.

In this embodiment, the index definition compiler in the reporting system performs definition configuration on the index according to the definition specification described above.

The query model is built based on the same definition specification, the data of various heterogeneous systems are linked, the binding relation in the query model can be directly called when the subsequent cross-data source query is convenient, and the problem of non-uniform data definition from different data sources is solved.

For example, existing metrics from the following data sources are known:

relational library a [ users table ]: id, name, age, dept_id

TABLE 1

Combining the indexes according to the attributes defined in the table 1:

And defining the relationship between the merged data sources:

C left join B on user_id

C left join A on user_id

Task generating section 52: and the data query request is used for acquiring the target index, and sub-query tasks corresponding to the data sources are generated according to the data query request.

The target index is the result which is wanted to be inquired when the cross data source is inquired, and the data related to the target index in the database is screened out through the filtering condition in the data inquiry request. First, filter conditions in a data query request are obtained, the filter conditions including filter metrics for defining a range of query data, the filter metrics being data that are involved in computing a target metric. And then analyzing the overlapping part of the index contained in the data source and the filtering index, taking the overlapping part as the filtering condition of the sub-query task corresponding to the data source, and splitting the sub-query task of each data source from the data query request based on the filtering condition of the obtained sub-query task. That is, the filtering indexes belonging to each specific data source are split from all the filtering indexes, and relevant data are screened in each data source through the split filtering indexes. In the embodiment, the judging condition contained in the sphere statement is set as the relation between each filtering index, namely, the style of the sphere statement is the style of the filtering index x and the filtering index y and the filtering index z, and the filtering indexes contained in the sphere statement of each subtask are ensured to have matched data in the corresponding data sources.

For example, in the log of check-in with the level of 1, the result of average time to work of staff who is learned as a family in each department is queried, and as can be seen from the above description, the average time to work of staff who is learned as a family in each department is the target index. The data query request selection avg (logo_time-logo_time) from log_ model where education = 'Gramineae' and level= 1group by dept_id, where education, dept _id and logo_time, logo_time are filtering indexes. From the above examples, dept_id is an index in the relational library a [ users table ], reduction and level=1 are indexes in the relational library B [ users_info table ], and logo_time are indexes in the file C [ log file ]. The range of the above-mentioned query is the query model log_model.

for relational library A

select id as dept_id from users

For relational library B

select user_id, level= 1as user_id from user_infos where education = 'family'

For File C

select user_id,logout_time-login_time as tmp_attr from login.log where level＝1

Execution unit 53: and the sub-query task is used for executing the sub-query task, and sub-query results in each data source are obtained according to the execution results.

Merging unit 54: and the method is used for merging sub-query results based on the binding relation and outputting a final query result of the target index.

Acquiring a binding relation between a first index and a second index;

For example, from the subtask split result in the task generating unit 52, a result merging command is generated based on the query model log_model:

where tmp_attrs is the subtask query result of file C, i.e., tmp_attrs=logo_time-log_time.

The following describes the process of the above-described cross-data-source query method in connection with a specific example, which is implemented by the database architecture shown in fig. 2, where the database architecture includes an interactive interface, an index definition compiler, a task manager, an index database, a schedule database, and data sources #1, #2, and #3. The index definition compiler is used for performing index definition and solidification configuration according to a preset definition rule system. The definition rule system comprises a scope, a unit main rule and an attribute main rule. And outputting the compiling result of the index definition compiler to a task manager, and calling a scheduling database and information in the index database by the task manager when cross-data source query is performed, wherein the scheduling database is used for inquiring scheduling configuration of a task, and cross-data source query is performed among the data source #1, the data source #2 and the data source #3 according to the scheduling configuration result.

The various numbers in the above embodiments are for illustration only and do not represent the order of assembly or use of the various components.

The foregoing is illustrative of the present invention and is not to be construed as limiting thereof, but rather, the present invention is to be construed as limited to the appended claims.

Claims

1. The cross-data source query method for the database is characterized by comprising the following steps of:

merging sub-query results based on the binding relation, and outputting a final query result of the target index;

the preset definition specifications comprise a scope definition specification, a unit definition specification and an attribute definition specification:

the attribute definition specification is used for defining the attribute of each index, and the indexes with the same meaning contain the same attribute fields;

configuring index attributes of data in each data source according to a preset definition specification, and constructing a query model comprising binding relations, wherein the query model comprises:

constructing binding relations among indexes through join sentences;

constructing a query model based on the index merging result and the binding relation;

the merging of sub-query results based on the binding relationship, outputting the final query result of the target index, includes:

acquiring a binding relation between a first index and a second index;

2. The method for querying a database across data sources according to claim 1, wherein the obtaining the data query request of the destination index generates sub-query tasks corresponding to each data source according to the data query request, and the method comprises:

3. The method of claim 1, further comprising conflict handling for attribute fields and binding relationships, the conflict handling comprising:

4. A cross-data source query device for a database, adapted to a cross-data source query method for a database according to any one of claims 1 to 3, wherein the cross-data source query device comprises:

5. The database cross-data source query device according to claim 4, wherein the definition unit is specifically configured to:

constructing binding relations among indexes through join sentences;

6. The database cross-data source query device according to claim 4, wherein the task generating unit is specifically configured to:

7. The device for querying a database across data sources according to claim 4, wherein the merging unit is specifically configured to:

acquiring a binding relation between a first index and a second index;