[go: up one dir, main page]

CN112163010B - Cross-data source query method and device for database - Google Patents

Cross-data source query method and device for database Download PDF

Info

Publication number
CN112163010B
CN112163010B CN202010868325.7A CN202010868325A CN112163010B CN 112163010 B CN112163010 B CN 112163010B CN 202010868325 A CN202010868325 A CN 202010868325A CN 112163010 B CN112163010 B CN 112163010B
Authority
CN
China
Prior art keywords
query
index
data
sub
data source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010868325.7A
Other languages
Chinese (zh)
Other versions
CN112163010A (en
Inventor
谭彰
陈英杰
陈则润
屈泽中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lanzhuo Digital Technology Co ltd
Original Assignee
Lanzhuo Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lanzhuo Digital Technology Co ltd filed Critical Lanzhuo Digital Technology Co ltd
Priority to CN202010868325.7A priority Critical patent/CN112163010B/en
Publication of CN112163010A publication Critical patent/CN112163010A/en
Application granted granted Critical
Publication of CN112163010B publication Critical patent/CN112163010B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for querying a database across data sources, wherein the method for querying the database across data sources comprises the following steps: configuring index attributes of data in each data source according to a preset definition specification, and constructing a query model comprising binding relations; acquiring a data query request of a target index, and generating sub-query tasks corresponding to all data sources according to the data query request; executing sub-query tasks, and obtaining sub-query results in each data source according to the execution results; and merging sub-query results based on the binding relation, and outputting a final query result of the target index. Through unified definition specification, the binding relation among indexes of different data sources is determined, the problem that data definition is not uniform when different data sources are queried is solved, and compared with a data warehouse with a higher construction threshold, the cost for constructing the definition specification is lower.

Description

Cross-data source query method and device for database
Technical Field
The invention belongs to the field of database query, and particularly relates to a method and a device for querying a database across data sources.
Background
In the enterprise management process, a report system needs to be constructed in aspects of business butt joint, performance statistics, manpower management and the like, a plurality of different data sources are often involved in index query of the report system, because the construction of the different data sources comes from different users, the definition of index data is not uniform, the conventional SQL technology is difficult to query across the different data sources, and a complex data warehouse system has great advantages in data processing capacity and data quality, but has high construction cost and lower cost performance for small and medium enterprises.
Disclosure of Invention
In order to solve the defects and shortcomings in the prior art, the invention provides a method and a device for querying a database across data sources, wherein the method for querying the database across data sources comprises the following steps:
configuring index attributes of data in each data source according to a preset definition specification, and constructing a query model comprising binding relations;
acquiring a data query request of a target index, and generating sub-query tasks corresponding to all data sources according to the data query request;
executing sub-query tasks, and obtaining sub-query results in each data source according to the execution results;
and merging sub-query results based on the binding relation, and outputting a final query result of the target index.
Optionally, the preset definition specification includes a scope definition specification, a unit definition specification and an attribute definition specification:
the scope definition specification is used for defining the effective application scope of the index in the database;
the unit definition specification is used for defining units of the indexes and unit conversion relations;
the attribute definition specification is used to define attributes of respective indexes, and indexes having the same meaning contain the same attribute fields.
Optionally, configuring the index attribute of the data in each data source according to a preset definition specification, and constructing a query model including a binding relationship includes:
acquiring attribute fields contained in indexes from different data sources, and merging indexes containing the same attribute fields;
constructing binding relations among indexes through join sentences;
and constructing a query model based on the index merging result and the binding relation.
Optionally, the data query request for obtaining the target index generates sub-query tasks corresponding to each data source according to the data query request, including:
obtaining a filtering condition in a data query request, wherein the filtering condition comprises a filtering index used for limiting a query data range;
analyzing the overlapping part of the index contained in the data source and the filtering index, taking the overlapping part as the filtering condition of the sub-query task corresponding to the data source, and splitting the sub-query task of each data source from the data query request based on the filtering condition of the obtained sub-query task.
Optionally, the merging the sub-query results based on the binding relationship, outputting the final query result of the target index, includes:
acquiring a binding relation between a first index and a second index;
based on the binding relation, combining the query result of the first index with the query result of the second index, and outputting the combined result as a final query result.
Optionally, the cross-data source query method further includes conflict processing for the attribute field and the binding relationship, where the conflict processing includes:
judging whether attribute fields defined for all indexes have naming conflict and whether binding relations among all indexes have relation conflict or not;
if naming conflict exists, determining whether to rename the conflicting attribute field according to the actual requirement of attribute definition, and if not, disabling the attribute field;
if the relation conflicts, the manual adjustment is carried out according to the actual condition of index binding.
The invention also provides a data source crossing query device of the database based on the same thought, and the data source crossing query device comprises:
definition unit: the query model comprises binding relations and is used for configuring index attributes of data in each data source according to preset definition specifications;
task generation unit: the data query request is used for acquiring the target index, and sub-query tasks corresponding to all the data sources are generated according to the data query request;
an execution unit: the sub-query task is used for executing the sub-query task, and sub-query results in each data source are obtained according to the execution results;
a merging unit: and the method is used for merging sub-query results based on the binding relation and outputting a final query result of the target index.
Optionally, the defining unit is specifically configured to:
acquiring attribute fields of indexes from different data sources in a definition specification, and merging indexes containing the same attribute fields;
constructing binding relations among indexes through join sentences;
and constructing a query model based on the index merging result and the binding relation.
Optionally, the task generating unit is specifically configured to:
obtaining a filtering condition in a data query request, wherein the filtering condition comprises a filtering index used for limiting a query data range;
analyzing the overlapping part of the index contained in the data source and the filtering index, taking the overlapping part as the filtering condition of the sub-query task corresponding to the data source, and splitting the sub-query task of each data source from the data query request based on the filtering condition of the obtained sub-query task.
Optionally, the merging unit is specifically configured to:
acquiring a binding relation between a first index and a second index;
based on the binding relation, combining the query result of the first index with the query result of the second index, and outputting the combined result as a final query result.
The technical scheme provided by the invention has the beneficial effects that:
through unified definition specification, the binding relation among indexes of different data sources is determined, the problem that data definition is not uniform when different data sources are queried is solved, and compared with a data warehouse with a higher construction threshold, the cost for constructing the definition specification is lower.
Drawings
In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a method for querying a database across data sources;
FIG. 2 is a block diagram of an architecture for a database for cross-data source querying;
fig. 3 is a block diagram of a database cross-data source query device according to the present invention.
Detailed Description
In order to make the structure and advantages of the present invention more apparent, the structure of the present invention will be further described with reference to the accompanying drawings.
Example 1
As shown in fig. 1, the present invention provides a method for querying a database across data sources, which includes:
s1: and configuring index attributes of data in each data source according to a preset definition specification, and constructing a query model comprising binding relations.
The preset definition specifications comprise a scope definition specification, a unit definition specification and an attribute definition specification;
the scope definition specification is used to define the effective scope of the index in the database, including the global domain and the topic domain. The constraint range of the global domain is the whole database platform, and the indexes defined as the global domain are universal in the whole database and all represent the same meaning, for example, the order numbers in the service represent the same meaning in the database and are all effective indexes. The constraint range of the topic domain is that each data source in the database, and the index defined as the topic domain is only common inside each data source in the database, such as a market total cost A and a human total cost B, and the index A and the index B are all total costs, but A only represents the total cost of the market, is only valid for the data source from the market and is invalid for the data source from the human.
The unit definition specification is used to define units of the respective indexes and unit conversion relations including types of units, specific units, whether standard units are used, and expressions converted to standard units. The type of unit refers to the kind of unit of the index, such as a length unit, a weight unit, an area unit, and the like. If the unit of the index is not a standard unit set in advance, it is also necessary to define an expression converted to the standard unit in the unit definition specification. If the unit of purchasing M material is ton, but the unit of receiving M material is kg, and the kg is assumed to be the standard unit, therefore, for the index representing purchasing M material, a conversion expression for converting ton into kg needs to be added in the unit definition specification so as to be converted into a unified unit when the index is calculated.
The attribute definition specification is used for defining the attribute of each index, and the indexes with the same meaning contain the same attribute fields, including the scope of the attribute, the attribute name, the Chinese meaning of the attribute, the data type corresponding to the attribute, the unit type and the unit name. Wherein the scope of the attribute is obtained from the scope definition specification, and the unit type and the unit name of the attribute are obtained from the unit definition specification.
In this embodiment, the index definition compiler in the reporting system performs definition configuration on the index according to the definition specification described above. Configuring index attributes of data in each data source according to a preset definition specification, and constructing a query model comprising binding relations, wherein the query model comprises:
the attribute fields of the indexes from different data sources are acquired, and the indexes containing the same attribute fields are combined, namely, the combined indexes have the same meaning as the indexes, although the names of the indexes are different because the indexes come from different data sources, so the indexes are combined into the same index.
Binding relations among the combined indexes are constructed through join sentences, and if the indexes with different attributes have associated requirements, the binding relations among the indexes can be constructed through command operations of left/right/inner/outer join, foreign keys and primary keys, so that a query model is obtained. For example, the job number of an employee and the department of the employee belong to different attributes, but in actual use situations, the employee and the corresponding department of the employee are often required to be queried simultaneously, so that a binding relationship is established between the job number and the department of the employee by using join sentences, and subsequent query is facilitated.
The cross-data source query method also comprises conflict processing on the attribute fields and the binding relations. When configuring the attribute of each index, the index definition compiler automatically checks the defined attribute field to determine whether the attribute field defined for each index has naming conflict with the entity attribute. If naming conflicts exist, the index definition compiler gives a prompt and reminds a configurator to select a conflict resolution strategy according to the actual requirement of attribute definition, whether to rename the conflicting attribute fields is determined, and if not, the attribute fields are deactivated. At the same time, all conflicting attributes require the addition of an alias to distinguish between the data source name or the query model name in which they reside.
And obtaining a query model of the database through the index combination and the binding relation. The query model is a logical mapping to the physical data source, and as can be known from the above description, the query model is used for representing the relationship between the indexes, and can bind the mutual tools of various heterogeneous systems, such as a relational database, a file, an API and the like. A query model is composed of data sources, relationship definitions, and conflict definitions, which may be physical data sources such as relational databases, files, etc., as well as other query models. The relation definition is the binding relation determined in the definition specification, and indexes of different attributes can be bound with each other, and a plurality of data sources or query models can be bound with each other. The conflict definition is the processing result of the conflict of the attribute fields in the definition specification.
The query model is built based on the unified definition specification, the data of various heterogeneous systems are linked, the binding relation in the query model can be directly called when the subsequent cross-data source query is convenient, and the problem of non-uniform data definition from different data sources is solved.
For example, existing metrics from the following data sources are known:
relational library a [ users table ]: id, name, age, dept_id
Relational library B [ user_info Table ] user_id, level [ post level ], reduction, age [ follow-up Table redundancy ]
Each row of the file C [ Login.log file ] can parse out user_id, login_time, logo_time, level [ log level ]
The content of the definition specification is first determined as shown in table 1.
TABLE 1
Attribute names Attribute Chinese name Data type
user_id User id int
age Age of user int
Combining the indexes according to the attributes defined in the table 1:
and merging the index id of the relation library A, the index user_id of the relation library B and the index user.id of the file C into an index with the attribute of the user_id, and merging the index age of the relation library A and the index work_age of the relation library B into an index with the attribute of the age in the same way.
And defining the relationship between the merged data sources:
C left join B on user_id
C left join A on user_id
through the command statement, in the data of the data attribute user_id, the data belonging to the same user id in the relation library A, the relation library B and the file C are bound together, for example, staff 1 in the relation library A and working time login_time=9 in the file C correspond to user id=1 in the relation library A, the relation library B and the file C respectively, namely, the binding relation between staff with id 1 and working time is established. After conflict processing, a query model about sign-in condition is constructed, and the name of the query model is log_model.
S2: and acquiring a data query request of the target index, and generating sub-query tasks corresponding to the data sources according to the data query request.
The target index is the result which is wanted to be inquired when the cross data source is inquired, and the data related to the target index in the database is screened out through the filtering condition in the data inquiry request. First, filter conditions in a data query request are obtained, the filter conditions including filter metrics for defining a range of query data, i.e., data that are involved in calculating the metrics for the purpose. And then analyzing the overlapping part of the index contained in the data source and the filtering index, taking the overlapping part as the filtering condition of the sub-query task corresponding to the data source, and splitting the sub-query task of each data source from the data query request based on the filtering condition of the obtained sub-query task. That is, the filtering indexes belonging to each specific data source are split from all the filtering indexes, and relevant data are screened in each data source through the split filtering indexes. In this embodiment, the judging condition included in the where statement is set as the relation between each filtering indicator, that is, the style of the where statement is made to be the style of the filtering indicator x and the filtering indicator y and the filtering indicator z, and it is ensured that the filtering indicators included in the where statement of each subtask have matched data in the corresponding data source.
For example, in the log of check-in with the level of 1, the result of average time to work of staff who is learned as a family in each department is queried, and as can be seen from the above description, the average time to work of staff who is learned as a family in each department is the target index. The data query request for obtaining the target index is select avg (logo_time-logo_time) from log_ model where education = 'Gramineae' and level= 1group by dept_id, where education, level =1, depth_id, logo_time, and logo_time are filtering indexes. From the above examples, dept_id is an index in the relational library a [ users table ], reduction and level=1 are indexes in the relational library B [ users_info table ], and logo_time are indexes in the file C [ log file ]. The range of the above-mentioned query is the query model log_model.
Therefore, the command sentences corresponding to the sub-tasks after splitting are respectively:
for relational library A
select id as user_id,dept_id from users
For relational library B
select user_id, level= 1as user_id from user_infos where education = 'family'
For File C
select user_id,logout_time-login_time as tmp_attr1 from login.log where level=1
S3: and executing the sub-query task, and obtaining sub-query results in each data source according to the execution results.
In the embodiment, for a relational database, the relational database is directly converted into SQL for inquiring, and sub-inquiring results are obtained according to the inquiring results of all the sub-tasks; for a non-relational database, such as a file, after being read row by row, a corresponding query program screens out a corresponding sub-query result according to a filtering condition in a sub-query task, and a person skilled in the art should know how to set the query program, which is not described here again.
S4: and merging sub-query results based on the binding relation, and outputting a final query result of the target index.
Acquiring a binding relation between a first index and a second index;
based on the binding relation, combining the query result of the first index with the query result of the second index, and outputting the combined result as a final query result.
For example, according to the subtask split result in S2, a result merge command is generated based on the query model log_model:
select avg(tmp_attrs),dept_id from login.log inner join user_info left_join users group by dept_id
the tmp_attrs is the query result of the subtask of the file C split in S2, i.e., tmp_attrs=logo_time-login_time.
And merging the employee shift time tmp_attrs inquired in the file C with employee identity information user_id in the relation library A and the relation library B in a one-to-one correspondence manner through inner join and left_join sentences. The working time of the staff is the first index, and the staff identity information is the second index. The data containing the same field with the file C in the relation library A and the relation library B are bound together in the previously constructed query model log_model, each queried employee and the corresponding working time thereof are combined together based on the binding relation between the working time of the employee and the identity information of the employee in the log_model, and the identity information of the same employee in different naming forms in the relation library A, the relation library B and the file C are combined together based on the binding relation obtained in the definition unit 51.
The following describes the process of the above-described cross-data-source query method in connection with a specific example, which is implemented by the database architecture shown in fig. 2, where the database architecture includes an interactive interface, an index definition compiler, a task manager, an index database, a schedule database, and data sources #1, #2, and #3. The index definition compiler is used for performing index definition and solidification configuration according to a preset definition rule system. The definition rule system comprises a scope, a unit main rule and an attribute main rule, and is used for defining a specification. And outputting the compiling result of the index definition compiler to a task manager, and calling a scheduling database and information in the index database by the task manager when cross-data source query is performed, wherein the scheduling database is used for inquiring scheduling configuration of a task, and cross-data source query is performed among the data source #1, the data source #2 and the data source #3 according to the scheduling configuration result.
The example specifically includes:
step one: the requester initiates a data query request and inputs a query command.
Step two: the database distributes the computing nodes to the data query requests through the task manager, and the requesters keep the progress update synchronous with the computing nodes.
Step three: and the computing node compiles the query command through the index definition compiler, and generates a planner program module according to the compiling result.
Step four: splitting the data query request into subtasks corresponding to the data source #1, the data source #2 and the data source #3 respectively by the planner program module, calling query resources in the scheduling database by the task manager, and executing the subtasks.
Step five: and merging results of all the data source subtasks by the planner program module, and returning the merged results to the requester.
Example two
As shown in fig. 3, the present invention proposes a database cross-data source query device 5, including:
definition unit 51: and the query model comprising binding relations is constructed by configuring index attributes of data in each data source according to preset definition specifications.
The preset definition specifications comprise a scope definition specification, a unit definition specification and an attribute definition specification;
the scope definition specification is used to define the effective scope of the index in the database, including the global domain and the topic domain. The constraint range of the global domain is the whole database platform, and the indexes defined as the global domain are universal in the whole database and all represent the same meaning, for example, the order numbers in the service represent the same meaning in the database and are all effective indexes. The constraint range of the topic domain is that each data source in the database, and the index defined as the topic domain is only common inside each data source in the database, such as a market total cost A and a human total cost B, and the index A and the index B are all total costs, but A only represents the total cost of the market, is only valid for the data source from the market and is invalid for the data source from the human.
The unit definition specification is used to define units of the respective indexes and unit conversion relations including types of units, specific units, whether standard units are used, and expressions converted to standard units. The type of unit refers to the kind of unit of the index, such as a length unit, a weight unit, an area unit, and the like. If the unit of the index is not a standard unit set in advance, it is also necessary to define an expression converted to the standard unit in the unit definition specification. If the unit of purchasing M material is ton, but the unit of receiving M material is kg, and the kg is assumed to be the standard unit, therefore, for the index representing purchasing M material, a conversion expression for converting ton into kg needs to be added in the unit definition specification so as to be converted into a unified unit when the index is calculated.
The attribute definition specification is used for defining the attribute of each index, and the indexes with the same meaning contain the same attribute fields, including the scope of the attribute, the attribute name, the Chinese meaning of the attribute, the data type corresponding to the attribute, the unit type and the unit name. Wherein the scope of the attribute is obtained from the scope definition specification, and the unit type and the unit name of the attribute are obtained from the unit definition specification.
In this embodiment, the index definition compiler in the reporting system performs definition configuration on the index according to the definition specification described above.
In this embodiment, the index definition compiler in the reporting system performs definition configuration on the index according to the definition specification described above. Configuring index attributes of data in each data source according to a preset definition specification, and constructing a query model comprising binding relations, wherein the query model comprises:
the attribute fields of the indexes from different data sources are acquired, and the indexes containing the same attribute fields are combined, namely, the combined indexes have the same meaning as the indexes, although the names of the indexes are different because the indexes come from different data sources, so the indexes are combined into the same index.
Binding relations among the combined indexes are constructed through join sentences, and if the indexes with different attributes have associated requirements, the binding relations among the indexes can be constructed through command operations of left/right/inner/outer join, foreign keys and primary keys, so that a query model is obtained. For example, the job number of an employee and the department of the employee belong to different attributes, but in actual use situations, the employee and the corresponding department of the employee are often required to be queried simultaneously, so that a binding relationship is established between the job number and the department of the employee by using join sentences, and subsequent query is facilitated.
The cross-data source query method also comprises conflict processing on the attribute fields and the binding relations. When configuring the attribute of each index, the index definition compiler automatically checks the defined attribute field to determine whether the attribute field defined for each index has naming conflict with the entity attribute. If naming conflicts exist, the index definition compiler gives a prompt and reminds a configurator to select a conflict resolution strategy according to the actual requirement of attribute definition, whether to rename the conflicting attribute fields is determined, and if not, the attribute fields are deactivated. At the same time, all conflicting attributes require the addition of an alias to distinguish between the data source name or the query model name in which they reside.
And obtaining a query model of the database through the index combination and the binding relation. The query model is a logical mapping to the physical data source, and as can be known from the above description, the query model is used for representing the relationship between the indexes, and can bind the mutual tools of various heterogeneous systems, such as a relational database, a file, an API and the like. A query model is composed of data sources, relationship definitions, and conflict definitions, which may be physical data sources such as relational databases, files, etc., as well as other query models. The relation definition is the binding relation determined in the definition specification, and indexes of different attributes can be bound with each other, and a plurality of data sources or query models can be bound with each other. The conflict definition is the processing result of the conflict of the attribute fields in the definition specification.
The query model is built based on the same definition specification, the data of various heterogeneous systems are linked, the binding relation in the query model can be directly called when the subsequent cross-data source query is convenient, and the problem of non-uniform data definition from different data sources is solved.
For example, existing metrics from the following data sources are known:
relational library a [ users table ]: id, name, age, dept_id
Relational library B [ user_info Table ] user_id, level [ post level ], reduction, age [ follow-up Table redundancy ]
Each row of the file C [ Login.log file ] can parse out user_id, login_time, logo_time, level [ log level ]
The content of the definition specification is first determined as shown in table 1.
TABLE 1
Attribute names Attribute Chinese name Data type
user_id User id int
age Age of user int
Combining the indexes according to the attributes defined in the table 1:
and merging the index id of the relation library A, the index user_id of the relation library B and the index user.id of the file C into an index with the attribute of the user_id, and merging the index age of the relation library A and the index work_age of the relation library B into an index with the attribute of the age in the same way.
And defining the relationship between the merged data sources:
C left join B on user_id
C left join A on user_id
through the command statement, in the data of the data attribute user_id, the data belonging to the same user id in the relation library A, the relation library B and the file C are bound together, for example, staff 1 in the relation library A and working time login_time=9 in the file C correspond to user id=1 in the relation library A, the relation library B and the file C respectively, namely, the binding relation between staff with id 1 and working time is established. After conflict processing, a query model about sign-in condition is constructed, and the name of the query model is log_model.
Task generating section 52: and the data query request is used for acquiring the target index, and sub-query tasks corresponding to the data sources are generated according to the data query request.
The target index is the result which is wanted to be inquired when the cross data source is inquired, and the data related to the target index in the database is screened out through the filtering condition in the data inquiry request. First, filter conditions in a data query request are obtained, the filter conditions including filter metrics for defining a range of query data, the filter metrics being data that are involved in computing a target metric. And then analyzing the overlapping part of the index contained in the data source and the filtering index, taking the overlapping part as the filtering condition of the sub-query task corresponding to the data source, and splitting the sub-query task of each data source from the data query request based on the filtering condition of the obtained sub-query task. That is, the filtering indexes belonging to each specific data source are split from all the filtering indexes, and relevant data are screened in each data source through the split filtering indexes. In the embodiment, the judging condition contained in the sphere statement is set as the relation between each filtering index, namely, the style of the sphere statement is the style of the filtering index x and the filtering index y and the filtering index z, and the filtering indexes contained in the sphere statement of each subtask are ensured to have matched data in the corresponding data sources.
For example, in the log of check-in with the level of 1, the result of average time to work of staff who is learned as a family in each department is queried, and as can be seen from the above description, the average time to work of staff who is learned as a family in each department is the target index. The data query request selection avg (logo_time-logo_time) from log_ model where education = 'Gramineae' and level= 1group by dept_id, where education, dept _id and logo_time, logo_time are filtering indexes. From the above examples, dept_id is an index in the relational library a [ users table ], reduction and level=1 are indexes in the relational library B [ users_info table ], and logo_time are indexes in the file C [ log file ]. The range of the above-mentioned query is the query model log_model.
Therefore, the command sentences corresponding to the sub-tasks after splitting are respectively:
for relational library A
select id as dept_id from users
For relational library B
select user_id, level= 1as user_id from user_infos where education = 'family'
For File C
select user_id,logout_time-login_time as tmp_attr from login.log where level=1
Execution unit 53: and the sub-query task is used for executing the sub-query task, and sub-query results in each data source are obtained according to the execution results.
In the embodiment, for a relational database, the relational database is directly converted into SQL for inquiring, and sub-inquiring results are obtained according to the inquiring results of all the sub-tasks; for a non-relational database, such as a file, after being read row by row, a corresponding query program screens out a corresponding sub-query result according to a filtering condition in a sub-query task, and a person skilled in the art should know how to set the query program, which is not described here again.
Merging unit 54: and the method is used for merging sub-query results based on the binding relation and outputting a final query result of the target index.
Acquiring a binding relation between a first index and a second index;
based on the binding relation, combining the query result of the first index with the query result of the second index, and outputting the combined result as a final query result.
For example, from the subtask split result in the task generating unit 52, a result merging command is generated based on the query model log_model:
select avg(tmp_attrs),dept_id from login.log inner join user_info left_join users group by dept_id
where tmp_attrs is the subtask query result of file C, i.e., tmp_attrs=logo_time-log_time.
And merging the employee shift time tmp_attrs inquired in the file C with employee identity information user_id in the relation library A and the relation library B in a one-to-one correspondence manner through inner join and left_join sentences. The working time of the staff is the first index, and the staff identity information is the second index. The data containing the same field with the file C in the relation library A and the relation library B are bound together in the previously constructed query model log_model, each queried employee and the corresponding working time thereof are combined together based on the binding relation between the working time of the employee and the identity information of the employee in the log_model, and the identity information of the same employee in different naming forms in the relation library A, the relation library B and the file C are combined together based on the binding relation obtained in the definition unit 51.
The following describes the process of the above-described cross-data-source query method in connection with a specific example, which is implemented by the database architecture shown in fig. 2, where the database architecture includes an interactive interface, an index definition compiler, a task manager, an index database, a schedule database, and data sources #1, #2, and #3. The index definition compiler is used for performing index definition and solidification configuration according to a preset definition rule system. The definition rule system comprises a scope, a unit main rule and an attribute main rule. And outputting the compiling result of the index definition compiler to a task manager, and calling a scheduling database and information in the index database by the task manager when cross-data source query is performed, wherein the scheduling database is used for inquiring scheduling configuration of a task, and cross-data source query is performed among the data source #1, the data source #2 and the data source #3 according to the scheduling configuration result.
The various numbers in the above embodiments are for illustration only and do not represent the order of assembly or use of the various components.
The foregoing is illustrative of the present invention and is not to be construed as limiting thereof, but rather, the present invention is to be construed as limited to the appended claims.

Claims (7)

1. The cross-data source query method for the database is characterized by comprising the following steps of:
configuring index attributes of data in each data source according to a preset definition specification, and constructing a query model comprising binding relations;
acquiring a data query request of a target index, and generating sub-query tasks corresponding to all data sources according to the data query request;
executing sub-query tasks, and obtaining sub-query results in each data source according to the execution results;
merging sub-query results based on the binding relation, and outputting a final query result of the target index;
the preset definition specifications comprise a scope definition specification, a unit definition specification and an attribute definition specification:
the scope definition specification is used for defining the effective application scope of the index in the database;
the unit definition specification is used for defining units of the indexes and unit conversion relations;
the attribute definition specification is used for defining the attribute of each index, and the indexes with the same meaning contain the same attribute fields;
configuring index attributes of data in each data source according to a preset definition specification, and constructing a query model comprising binding relations, wherein the query model comprises:
acquiring attribute fields contained in indexes from different data sources, and merging indexes containing the same attribute fields;
constructing binding relations among indexes through join sentences;
constructing a query model based on the index merging result and the binding relation;
the merging of sub-query results based on the binding relationship, outputting the final query result of the target index, includes:
acquiring a binding relation between a first index and a second index;
based on the binding relation, combining the query result of the first index with the query result of the second index, and outputting the combined result as a final query result.
2. The method for querying a database across data sources according to claim 1, wherein the obtaining the data query request of the destination index generates sub-query tasks corresponding to each data source according to the data query request, and the method comprises:
obtaining a filtering condition in a data query request, wherein the filtering condition comprises a filtering index used for limiting a query data range;
analyzing the overlapping part of the index contained in the data source and the filtering index, taking the overlapping part as the filtering condition of the sub-query task corresponding to the data source, and splitting the sub-query task of each data source from the data query request based on the filtering condition of the obtained sub-query task.
3. The method of claim 1, further comprising conflict handling for attribute fields and binding relationships, the conflict handling comprising:
judging whether attribute fields defined for all indexes have naming conflict and whether binding relations among all indexes have relation conflict or not;
if naming conflict exists, determining whether to rename the conflicting attribute field according to the actual requirement of attribute definition, and if not, disabling the attribute field;
if the relation conflicts, the manual adjustment is carried out according to the actual condition of index binding.
4. A cross-data source query device for a database, adapted to a cross-data source query method for a database according to any one of claims 1 to 3, wherein the cross-data source query device comprises:
definition unit: the query model comprises binding relations and is used for configuring index attributes of data in each data source according to preset definition specifications;
task generation unit: the data query request is used for acquiring the target index, and sub-query tasks corresponding to all the data sources are generated according to the data query request;
an execution unit: the sub-query task is used for executing the sub-query task, and sub-query results in each data source are obtained according to the execution results;
a merging unit: and the method is used for merging sub-query results based on the binding relation and outputting a final query result of the target index.
5. The database cross-data source query device according to claim 4, wherein the definition unit is specifically configured to:
acquiring attribute fields of indexes from different data sources in a definition specification, and merging indexes containing the same attribute fields;
constructing binding relations among indexes through join sentences;
and constructing a query model based on the index merging result and the binding relation.
6. The database cross-data source query device according to claim 4, wherein the task generating unit is specifically configured to:
obtaining a filtering condition in a data query request, wherein the filtering condition comprises a filtering index used for limiting a query data range;
analyzing the overlapping part of the index contained in the data source and the filtering index, taking the overlapping part as the filtering condition of the sub-query task corresponding to the data source, and splitting the sub-query task of each data source from the data query request based on the filtering condition of the obtained sub-query task.
7. The device for querying a database across data sources according to claim 4, wherein the merging unit is specifically configured to:
acquiring a binding relation between a first index and a second index;
based on the binding relation, combining the query result of the first index with the query result of the second index, and outputting the combined result as a final query result.
CN202010868325.7A 2020-08-26 2020-08-26 Cross-data source query method and device for database Active CN112163010B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010868325.7A CN112163010B (en) 2020-08-26 2020-08-26 Cross-data source query method and device for database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010868325.7A CN112163010B (en) 2020-08-26 2020-08-26 Cross-data source query method and device for database

Publications (2)

Publication Number Publication Date
CN112163010A CN112163010A (en) 2021-01-01
CN112163010B true CN112163010B (en) 2024-04-12

Family

ID=73860187

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010868325.7A Active CN112163010B (en) 2020-08-26 2020-08-26 Cross-data source query method and device for database

Country Status (1)

Country Link
CN (1) CN112163010B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113448986B (en) * 2021-09-01 2022-03-01 阿里云计算有限公司 Query method, query device, storage medium and program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5367675A (en) * 1991-12-13 1994-11-22 International Business Machines Corporation Computer automated system and method for optimizing the processing of a query in a relational database system by merging subqueries with the query
US5511186A (en) * 1992-11-18 1996-04-23 Mdl Information Systems, Inc. System and methods for performing multi-source searches over heterogeneous databases
CN107491561A (en) * 2017-09-25 2017-12-19 北京航空航天大学 A kind of urban transportation heterogeneous data integrated system and method based on body
CN109710638A (en) * 2019-01-01 2019-05-03 湖南大学 A multi-query optimization method on federated distributed RDF database
CN111339334A (en) * 2020-02-11 2020-06-26 支付宝(杭州)信息技术有限公司 Data query method and system for heterogeneous graph database

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7584178B2 (en) * 2006-04-20 2009-09-01 International Business Machines Corporation Query condition building using predefined query objects
US9846712B2 (en) * 2015-04-25 2017-12-19 International Business Machines Corporation Index-only multi-index access

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5367675A (en) * 1991-12-13 1994-11-22 International Business Machines Corporation Computer automated system and method for optimizing the processing of a query in a relational database system by merging subqueries with the query
US5511186A (en) * 1992-11-18 1996-04-23 Mdl Information Systems, Inc. System and methods for performing multi-source searches over heterogeneous databases
CN107491561A (en) * 2017-09-25 2017-12-19 北京航空航天大学 A kind of urban transportation heterogeneous data integrated system and method based on body
CN109710638A (en) * 2019-01-01 2019-05-03 湖南大学 A multi-query optimization method on federated distributed RDF database
CN111339334A (en) * 2020-02-11 2020-06-26 支付宝(杭州)信息技术有限公司 Data query method and system for heterogeneous graph database

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
关联数据联合查询处理技术研究;米培元;信息科技;20120715;全文 *
基于XQuery的异构数据源查询处理;严小泉;刘渊;;计算机工程;20090720(14);全文 *
数据仓库跨版本透明查询系统的设计;鲍蓉;;计算机工程;20090120(02);全文 *

Also Published As

Publication number Publication date
CN112163010A (en) 2021-01-01

Similar Documents

Publication Publication Date Title
US7716167B2 (en) System and method for automatically building an OLAP model in a relational database
US9213698B1 (en) Unified data architecture for business process management and data modeling
Naderifar et al. A review on conformance checking technique for the evaluation of process mining algorithms
Saltz et al. Exploring the process of doing data science via an ethnographic study of a media advertising company
CN115221337B (en) Data weaving processing method, device, electronic device and readable storage medium
KR102639897B1 (en) Data architecture management system
CA2772824C (en) Role mapping and training tool
WO2022253165A1 (en) Scheduling method, system, server and computer readable storage medium
CN118608340B (en) Information management method based on school management cloud service platform
US20100031263A1 (en) Process model lean notation
CN112633621B (en) Power grid enterprise management decision-making system and method based on PAAS platform
CN112163010B (en) Cross-data source query method and device for database
Dombrowski et al. Knowledge graphs for an automated information provision in the factory planning
Abrahiem A new generation of middleware solutions for a near-real-time data warehousing architecture
Gavriilidis et al. Towards a Modular Data Management System Framework.
US9305066B2 (en) System and method for remote data harmonization
US20240095243A1 (en) Column-based union pruning
CN108470076A (en) Data resource planning system and method and data management system
US20120317081A1 (en) Detecting wasteful data collection
CN116561114A (en) A Management Method Based on Metadata
CN115618825A (en) Financial statement merging method and device, computer readable medium and terminal equipment
Tannen et al. Provenance for Database Transformations.
Niedermann Deep Business Optimization: concepts and architecture for an analytical business process optimization platform
Simitsis et al. Logical optimization of ETL workflows
CN103530368B (en) Demand data decomposition method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: Room 150 (1-1-180), Changpu Road, jishigang Town, Haishu District, Ningbo City, Zhejiang Province

Applicant after: Lanzhuo Digital Technology Co.,Ltd.

Address before: Room 150 (1-1-180), Changpu Road, jishigang Town, Haishu District, Ningbo City, Zhejiang Province

Applicant before: ZHEJIANG LANZHUO INDUSTRIAL INTERNET INFORMATION TECHNOLOGY Co.,Ltd.

Country or region before: China

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant