CN113094444B - Data processing method, data processing device, computer equipment and medium - Google Patents
Data processing method, data processing device, computer equipment and medium Download PDFInfo
- Publication number
- CN113094444B CN113094444B CN202010024591.1A CN202010024591A CN113094444B CN 113094444 B CN113094444 B CN 113094444B CN 202010024591 A CN202010024591 A CN 202010024591A CN 113094444 B CN113094444 B CN 113094444B
- Authority
- CN
- China
- Prior art keywords
- aggregation
- screening
- dimension
- tables
- aggregate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present disclosure provides a data processing method, including: a data source is acquired. Wherein the data source includes at least one dimension item and at least one index item. And respectively carrying out multiple aggregation on the data sources to obtain multiple aggregation tables. And receiving a query request from the client, and screening the aggregation tables from the plurality of aggregation tables to obtain the aggregation table conforming to the query request. And then the filtered aggregation table meeting the query request is sent to the client. The present disclosure also provides a data processing apparatus, a computer device, and a computer-readable storage medium.
Description
Technical Field
The present disclosure relates to the field of computer technology, and more particularly, to a data processing method, a data processing apparatus, a computer device, and a medium.
Background
In the related art, when a user sends a query request to a data warehouse, a data source and an aggregation table need to be specified, but as the requirement of a service changes, the dimension analyzed by the user also changes, which results in aggregation of a plurality of aggregation tables from one data source. The data table of the existing data warehouse can be regarded as a materialized view or index in the database, and a user needs to select a table which meets the query requirement and has optimal query performance from a large number of returned aggregated tables after each time of sending a query request, so that the user needs to have clear knowledge on the structure and the model of the data table. Thus, not only the learning cost of the user is increased, but also the maintenance cost of the server is increased.
Disclosure of Invention
In view of this, the present disclosure provides a data processing method, a data processing apparatus, a computer device, and a medium.
One aspect of the present disclosure provides a data processing method, including: a data source is acquired. Wherein the data source includes at least one dimension item and at least one index item. And respectively carrying out multiple aggregation on the data sources to obtain multiple aggregation tables. And receiving a query request from the client, and screening the aggregation tables from the plurality of aggregation tables to obtain the aggregation table conforming to the query request. And then the filtered aggregation table meeting the query request is sent to the client.
According to an embodiment of the present disclosure, the aggregating the data sources for multiple times, respectively, to obtain multiple aggregation tables includes: for any one of the multiple aggregations, a dimension term for the any one aggregation is determined from at least one dimension term, an index term for the any one aggregation is determined from at least one index term, and an aggregation function for the any one aggregation is determined. The data sources are aggregated based on the dimension terms for the arbitrary aggregation, the index terms for the arbitrary aggregation, and the aggregation function for the arbitrary aggregation to obtain an aggregation table for the arbitrary aggregation.
According to an embodiment of the present disclosure, the data source performs data update and version update once every predetermined period. The version of any one of the plurality of aggregate tables is the same as the version of the data source for that any one of the aggregate tables.
According to an embodiment of the present disclosure, the query request includes a query version. The filtering the aggregate table meeting the query request from the plurality of aggregate tables includes: and screening the aggregation tables according to the query version to obtain the aggregation table subjected to the first screening.
According to an embodiment of the present disclosure, the query request further includes: a specified dimension item and a specified index item. The filtering the aggregate table from the plurality of aggregate tables to obtain the aggregate table meeting the query request further includes: and screening the aggregation table containing the specified dimension items and the specified index items from the aggregation table subjected to the first screening to obtain an aggregation table subjected to the second screening.
According to an embodiment of the present disclosure, the query request further includes a filter condition including the specified dimension item and the value of the specified dimension item. The filtering the aggregate table from the plurality of aggregate tables to obtain the aggregate table meeting the query request further includes: for any aggregation table in the aggregation tables subjected to the second screening, searching dimension items in any aggregation table according to a preset sequence. And matching each searched dimension item with the designated dimension item, if the matching is successful, determining the weight of the dimension item, continuing searching the next dimension item, and if the matching is failed, ending the search. A score for the any aggregate table is then determined based on the weights of the determined dimension items. After determining the score of each aggregate form subjected to the second screening, the aggregate form having the highest score is screened as the aggregate form subjected to the third screening.
According to an embodiment of the present disclosure, the filtering the aggregate table from the plurality of aggregate tables to obtain the aggregate table meeting the query request further includes: and screening the aggregation table with the minimum number of dimension items and/or the minimum number of data from the aggregation table subjected to the third screening.
Another aspect of the present disclosure provides a data processing apparatus comprising: the device comprises an acquisition module, an aggregation module, a receiving module, a screening module and a sending module. The acquisition module is used for acquiring the data source. Wherein the data source includes at least one dimension item and at least one index item. The aggregation module is used for respectively carrying out multiple aggregation on the data sources so as to obtain multiple aggregation tables. Wherein, the dimension items of the aggregation tables are different, and/or the index items of the aggregation tables are different. The receiving module is used for receiving a query request from the client. The screening module is used for screening the aggregation table meeting the query request from a plurality of aggregation tables. The sending module is used for sending the aggregation table meeting the query request to the client.
Another aspect of the present disclosure provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method as described above when executing the program.
Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions that, when executed, are configured to implement a method as described above.
Another aspect of the present disclosure provides a computer program comprising computer executable instructions which when executed are for implementing a method as described above.
According to the embodiment of the disclosure, when a user needs to query the aggregation table about certain dimensions and certain indexes, the aggregation table which is most suitable for the query request is screened from a plurality of aggregation tables which are obtained by preprocessing based on the query request, and then the screened aggregation table is returned to the client. And a plurality of aggregation tables do not need to be returned to the client as in the related art, so that a user does not need to select from the returned aggregation tables, the user can learn interesting data as soon as possible, and does not need to pay a large amount of learning and use cost, thereby meeting the user requirements.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments thereof with reference to the accompanying drawings in which:
FIG. 1 schematically illustrates an exemplary system architecture of an application data processing method and data processing apparatus according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a data processing method according to an embodiment of the disclosure;
FIG. 3 schematically illustrates a flow chart of a data processing method according to another embodiment of the present disclosure;
FIG. 4 schematically illustrates a block diagram of a data processing apparatus according to an embodiment of the present disclosure; and
Fig. 5 schematically illustrates a block diagram of a computer device according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a convention should be interpreted in accordance with the meaning of one of skill in the art having generally understood the convention (e.g., "a system having at least one of A, B and C" would include, but not be limited to, systems having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a formulation similar to at least one of "A, B or C, etc." is used, in general such a formulation should be interpreted in accordance with the ordinary understanding of one skilled in the art (e.g. "a system with at least one of A, B or C" would include but not be limited to systems with a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
The embodiment of the disclosure provides a data processing method and a data processing device. The method includes a preparation process, a reception process, and a routing process. The preparation process may be divided into an acquisition process and an aggregation process. During the acquisition process, a data source is acquired. Wherein the data source includes at least one dimension item and at least one index item. And in the aggregation process, respectively performing multiple aggregation on the acquired data sources to obtain multiple aggregation tables. Wherein, the dimension items of the aggregation tables are different, and/or the index items of the aggregation tables are different. After the preparation is finished, a receiving process can be performed. In the receiving process, a query request from a client is received. The routing process is then performed, and may be divided into a screening process and a transmitting process. And in the screening process, screening the aggregation table meeting the query request from the plurality of aggregation tables. And in the sending process, sending the filtered aggregation table meeting the query request to a corresponding client.
FIG. 1 schematically illustrates an exemplary system architecture 100 in which data processing methods and data processing apparatus may be applied, according to embodiments of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.
As shown in fig. 1, the system architecture 100 may include a terminal device 101, a network 102, a primary server 103, and secondary servers 104, 105, 106, where different levels or different servers of the same level may provide different service related data.
Network 102 is the medium used to provide a communication link between terminal device 101 and primary server 103. Network 102 may include various connection types such as wired, wireless communication links, or fiber optic cables, among others. The primary service end 103 may communicate with the secondary service ends 104, 105, 106 via various wired or wireless communication links.
The terminal device 101 may be a variety of electronic devices including, but not limited to, smart phones, personal computers, tablet computers, smart watches, etc., without limitation.
A client application (hereinafter referred to as "client") having various functions can be installed in the terminal apparatus 101. The functional support of each client in the terminal apparatus 101 can be broken down into service ends of each hierarchy. For example, one client in the terminal device 101 has an advertisement delivery function, and the user owner sends an advertisement delivery request to the primary service terminal 103 through the terminal device 101, and the primary service terminal 103 delivers advertisements of different services through different secondary service terminals. For users who have placed advertisements, they frequently view the placement of advertisements after placement of the advertisements. The primary server 103 needs to collect data from the secondary servers 104, 105, 106, which can characterize the effect of advertising, and push the data to the terminal 101 in the form of a report for presentation to the user. The present example is exemplified by an advertisement delivery service scenario, and in other examples, the present invention can be applied to various service scenarios, which are not limited herein.
It should be understood that the number of terminal devices, networks, primary and secondary servers, and the number of tiers of servers in fig. 1 are merely illustrative. Any number may be provided according to actual needs.
Currently, online real-time analysis (Online Analytical Processing, OLAP) technology is widely used in data analysis and intelligent decision making, and the above process of collecting data and forming report forms can be applied to the OLAP technology.
OLAP technology mainly uses data warehouses for data storage and querying. Although various data warehouses have large differences in architecture design and implementation details due to different application scenarios, they allow users to analyze multidimensional data from various angles, most commonly referred to as Roll-Up (Roll-Up), which represents an aggregation operation of data along a dimension according to a certain rule. Aggregation occurs along a dimension, i.e., in a hierarchical relationship, from a child dimension to a parent dimension. In actual engineering practice, to meet the needs of different analysis dimensions, it is often necessary to build multiple Roll-Up tables (also referred to as "aggregate tables") based on the same data source, with different table patterns (tableschema) between them, and to distinguish between them by different names. When a user wants to analyze data from a new dimension, if the original Roll-Up table cannot meet the requirement, a new Roll-Up table needs to be built based on the dimension analyzed by the user, and a new request is built based on the new Roll-Up table name to acquire the data.
When a user sends a query request to a data warehouse, a data source and a Roll-Up table need to be specified, but as the requirement of a service changes, the dimension analyzed by the user also changes, and a plurality of aggregation tables are aggregated from one data source. The Roll-Up table of the existing data warehouse can be regarded as a materialized view or index in the database, and a user needs to select a table which meets the query requirement and has optimal query performance from a large number of returned Roll-Up tables after sending a query request each time, so that the user needs to clearly know the mode (schema) and the data model of the Roll-Up table, thereby increasing the learning cost of the user and the maintenance cost of a server.
When a user makes a query request, the user does not really care how many Roll-Up tables are on a designated data source and how many Roll-Up tables are in each mode, and the user only needs to care about the data structure under the dimension and index which meet the filtering condition, so the data warehouse should have a strategy of automatically routing to the optimal Roll-Up table according to the request so as to reduce the learning and using cost of the user.
According to an embodiment of the present disclosure, there is provided a data processing method for returning a data result most conforming to a user's demand in response to a query request of the user, and the method is exemplified by a legend. It should be noted that the sequence numbers of the respective operations in the following methods are merely representative of the operations for the purpose of description, and should not be construed as representing the order of execution of the respective operations. The method need not be performed in the exact order shown unless explicitly stated.
Fig. 2 schematically illustrates a flow chart of a data processing method according to an embodiment of the present disclosure.
As shown in fig. 2, the method may include operations S201 to S205.
In operation S201, a data source is acquired.
Wherein the data source includes at least one dimension (key) term and at least one index (value) term.
For example, in a business scenario of advertisement placement, a data source is in a Table format, which may be referred to as a Base Table (Base Table), and the data source includes all dimension items and index items of a supported query. As shown in table 1.
TABLE 1
ID | Date | City | Clicks | Cost |
1 | 2017 | Beijing | 100 | 10 |
1 | 2017 | Shanghai | 100 | 10 |
1 | 2018 | Beijing | 200 | 20 |
2 | 2017 | Beijing | 150 | 15 |
2 | 2017 | Shanghai | 200 | 20 |
2 | 2018 | Beijing | 200 | 20 |
In table 1, it can be seen from the header section that three dimensional entries are included: ID (identification information), date, and City, and two index items: clicks (number of clicks) and Cost (amount of consumption). The table body (body) section lists 6 pieces of data, each of which includes the value of the dimension item "ID", the value of the dimension item "Date", the value of the dimension item "City", the value of the index item "Clicks", and the value of the index item "Cost". In this example, different dimension items may be used to characterize different advertisement effect data, such as identification information of users who put advertisements, a putting plan, a putting unit, an advertisement creative, etc., and may be divided according to actual service needs, which is not limited herein. The index item is specific data showing of advertisement putting effect, such as clicking, showing, consuming and the like, and can be divided according to actual service requirements without limitation. The above table 1 is merely exemplary, and in other scenarios, the dimension items and the index items in the data source may be flexibly changed, for example, in a service scenario of performing music recommendation, the dimension items may be divided into a music type, a musician, a recommendation time, etc., and the index items may be divided into a play number, a praise number, a collection number, etc.
Then, in operation S202, the data sources are aggregated multiple times, respectively, to obtain a plurality of aggregated tables.
In order to facilitate the inquiry of users, the data sources are respectively aggregated into a plurality of aggregation tables according to different dimension items and/or index items. The dimension items of the aggregation tables are different, and/or the index items of the aggregation tables are different. The data source contains all of the dimension terms and index terms of the query supported, while the aggregated form resulting from the aggregation generally contains only a portion of the dimension terms and/or a portion of the index terms. The aggregation tables are different from each other, and the difference between any two aggregation tables is represented by at least one of the following: different dimension items, different index items and different aggregation modes.
Next, in operation S203, a query request from the client is received.
Wherein the query request from the client may include one or more parameters that can describe the query objective of the user.
Next, in operation S204, an aggregate table conforming to the query request is selected from the plurality of aggregate tables.
In this operation S204, an aggregation table that matches parameters in the query request is obtained from the plurality of aggregation tables, and is used as a query result for the query request.
Next, in operation S205, the aggregate table that meets the query request obtained by the screening is transmitted to the client.
It can be understood by those skilled in the art that, according to the data processing method of the embodiment of the present disclosure, when a user needs to query an aggregation table about certain dimensions and certain indexes, an aggregation table that best meets a query request is selected from a plurality of aggregation tables obtained by preprocessing based on the query request, and then the selected aggregation table is returned to the client. And a plurality of aggregation tables do not need to be returned to the client as in the related art, so that a user does not need to select from the returned aggregation tables, the user can learn interesting data as soon as possible, and does not need to pay a large amount of learning and use cost, thereby meeting the user requirements.
In accordance with an embodiment of the present disclosure, the above-described process of performing aggregation on data sources multiple times, respectively, to obtain a plurality of aggregation tables is exemplarily described as an example of a one-time aggregation process. For any one of a plurality of aggregations, first, a dimension term for the next aggregation is determined from at least one dimension term contained in the data source, an index term for the next aggregation is determined from at least one index term contained in the data source, and an aggregation function for the next aggregation is also determined. The data sources are then aggregated based on the dimension terms for the secondary aggregation, the index terms for the secondary aggregation, and the aggregation function for the secondary aggregation to obtain an aggregate table for the secondary aggregation.
For example, when data is imported to the data source, in order to facilitate data analysis, the data may be arranged in ascending order or descending order according to the values of the dimension items, and if there are multiple dimension items, the data may be sequentially used as a sorting basis according to a predefined order. As in table 1 above, the order of dimension items is predefined as "ID" → "Date" → "City". The method comprises the steps of firstly carrying out ascending arrangement on a plurality of data according to the value of a dimension item 'ID', carrying out ascending arrangement on the data with the same value of the dimension item 'ID', carrying out ascending arrangement according to the value of a dimension item 'Date', carrying out ascending arrangement on the data with the same value of the dimension item 'Date', and carrying out ascending arrangement according to the value of a dimension item 'City'. According to the different types of the values of the dimension items, an ordering rule suitable for the types is used, such as integer ordering according to the size, date ordering according to time sequencing and the like.
When the data source is polymerized once, the dimension items for the present polymerization operation comprise a dimension item K 1 and a dimension item K 2, the index items for the present polymerization operation comprise an index item V 1, and the polymerization function f for the present polymerization operation is determined. In the data source, if the values of dimension items K 1 and the values of dimension items K 1 of N data are the same, the values of index items V 1 of the N data are aggregated based on an aggregation function f to aggregate the N data into one data corresponding to dimension items K 1, K 2 and V 1, where N is a positive integer greater than 1. Among other things, common aggregation functions may include: summing (Sum), counting (Count), minimum (Min), maximum (Max), etc. Illustratively, the data sources are shown in Table 2, including a dimension term "Date", a dimension term "City", and an index term "Cost".
TABLE 2
Date | City | Cost |
2017 | Beijing | 10 |
2017 | Tianjin | 20 |
2018 | Beijing | 10 |
2018 | Tianjin | 20 |
The data source was aggregated for the first time to obtain an aggregate table as shown in table 3. The dimension item for the current aggregation is "Date", the index item for the current aggregation is "Cost", and the aggregation function for the current aggregation is a sum function.
TABLE 3 Table 3
Date | Cost |
2017 | 30 |
2018 | 30 |
The data source was subjected to a second aggregation to obtain an aggregation table as shown in table 4. The dimension term for the current aggregation is "City", the index term for the current aggregation is "Cost", and the aggregation function for the current aggregation is also a sum function.
TABLE 4 Table 4
City | Cost |
Beijing | 20 |
Tianjin | 40 |
In embodiments of the present disclosure, the data source performs data updates and version (version) updates every predetermined period as the data monitored by the data source is continuously generated over time. For example, the first data is poured into the base table of the formed data source as the data source of version 1, the second data is poured into the base table of the formed data source as the data source of version 2, and so on. When the aggregation operation is performed, the aggregation table obtained by aggregation has the same version as the data source for which the aggregation operation is directed. For example, if table 3 above is aggregated for version 1 data sources, the version number of table 3 is set to version 1.
According to embodiments of the present disclosure, a query request from a client may include a query version. The process of screening the aggregate table meeting the query request from the plurality of aggregate tables may include: and screening the aggregation tables according to the query version to obtain the aggregation table subjected to the first screening. For example, the latest version of the data source in the server is version 3, and the user wishes to view the aggregated data of the latest version, and the query request includes a parameter characterizing "version 3". The server side includes a first server, a second server and a third server which are distributed, wherein the first server and the second server are updated to the version 3, and the third server is updated to the version 2 only at present due to the delay between machines. When a query request containing a parameter characterizing "version 3" is assigned to the first server or the second server, the first server or the second server screens an aggregate form having a publication number of "version 3" based on the query request, and returns the screened aggregate form to the user. And when the query request containing the parameter representing the 'version 3' is distributed to the third server, the third server finds that the aggregated table with the version number of 'version 3' does not exist currently based on the query request, and then returns a prompt message of 'no query result' to the user. By accurately screening the query version, the problem of inconsistent data aggregation results caused by asynchronous data updating among different machines can be avoided. The first filtering may be referred to as version rule based filtering.
According to an embodiment of the present disclosure, the query request may further include: a specified dimension item and a specified index item. The specified dimension items can be one or more, and the specified index items can be one or more. The specified dimension terms and the specified index terms characterize the dimension terms and index terms of interest to the querying user. The process of screening the aggregate table meeting the query request from the plurality of aggregate tables may further include: and screening the aggregation table containing the specified dimension items and the specified index items from the aggregation table subjected to the first screening to obtain an aggregation table subjected to the second screening. The second screening may be referred to as a query term rule-based screening.
It will be appreciated that the aggregate form that is subject to the second filtering includes both the dimension items and the index items that are of interest to the user, but may also include some dimension items and index items that are not of interest to the user. And if the ordering is improper, the efficiency of searching the dimension items and index items concerned by the user is not high. To this end, the aggregate table subjected to the second screening may be further subjected to a third screening, and according to an embodiment of the present disclosure, the query request may further include a filtering condition, where the filtering condition includes a specified dimension item and a value of the specified dimension item. The specified dimension items may be one or more.
The process of screening the aggregate table meeting the query request from the plurality of aggregate tables may further include: for any aggregation table in the aggregation tables subjected to the second screening, searching dimension items in any aggregation table according to a preset sequence. And matching each searched dimension item with the designated dimension item, if the matching is successful, determining the weight of the dimension item, continuing searching the next dimension item, and if the matching is failed, ending the search. A score for the any aggregate table is then determined based on the weights of the determined dimension items. After determining the score of each aggregate form subjected to the second screening, the aggregate form having the highest score is screened as the aggregate form subjected to the third screening. The higher the score of an aggregate table, the faster all specified dimension items and the values of all specified dimension items can be found in the aggregate table in a predetermined order. The third screening may be referred to as filtering, sorting rule based screening.
Further, according to an embodiment of the present disclosure, the filtering the aggregate table from the plurality of aggregate tables to obtain the aggregate table meeting the query request may further include: and screening the aggregation table with the minimum number of dimension items and/or the minimum number of data from the aggregation table subjected to the third screening. For example, the dimension items are arranged in columns, and then the aggregation tables with fewer columns corresponding to the dimension items are selected, because the aggregation tables with fewer columns corresponding to the dimension items have higher aggregation granularity, smaller data volume and higher query efficiency. Similarly, if the number of rows in the aggregation table represents the data quantity, the aggregation table with fewer rows is selected, the data quantity is small, and the query efficiency is higher.
The embodiments described above are exemplarily described with reference to fig. 3 in combination with a specific example.
Fig. 3 schematically illustrates a flow chart of a data processing method according to another embodiment of the present disclosure. In the example shown in fig. 3, the data sources are shown in table 1, wherein the different dimension items are distributed by columns, the different index items are also distributed by columns, and the different data are distributed by rows. The columns corresponding to the dimension items are hereinafter referred to as dimension columns, and the columns corresponding to the index items are hereinafter referred to as index columns. Aggregation results in a plurality of aggregated tables for the data sources shown in table 1.
As shown in fig. 3, the method may include operations S301 to S307.
In operation S301, a query request from a client is received.
In operation S302, a first filtering is performed on a plurality of aggregation tables based on a version rule according to a query version in a query request.
To enable concurrent access to data, multi-version concurrency control (Multi-Version Concurrency Control, MVCC) is typically implemented in a data warehouse to avoid users from acquiring inconsistent data upon request. Therefore, the user will include a version number of the current request when requesting, the data processing method according to the present disclosure will select an aggregation table according to the version number, if the version number of the query that can be provided by one aggregation table currently is not less than the version number of the request, the rule is satisfied, and the next rule is entered for determination, otherwise, the aggregation table is excluded.
Then, in operation S303, the aggregated form subjected to the first screening is subjected to the second screening based on the query term rule according to the specified dimension term and the specified index term in the query request.
And judging whether the aggregation table contains all the specified dimension items and the specified index items, if so, entering the next rule judgment, otherwise, excluding the aggregation table.
Next, in operation S304, according to the filtering condition in the query request, the aggregation table subjected to the second filtering is subjected to the third filtering based on the filtering and sorting rule.
The aggregation tables of the same data source have different dimension columns or index columns, and are ordered according to the dimension columns, and the index columns are aggregated. Similar to the B-index of the database, the columns of the query filtering conditions are matched leftmost according to the dimension column sequence of the aggregation table, and more matched columns have more efficient query efficiency.
For example, the aggregate table subjected to the second screening includes two aggregate tables as shown in tables 5 and 6, respectively.
TABLE 5
ID | Date | City | Cost |
1 | 2017 | Beijing | 10 |
1 | 2017 | Shanghai | 10 |
1 | 2018 | Beijing | 20 |
2 | 2017 | Beijing | 15 |
2 | 2017 | Shanghai | 20 |
2 | 2018 | Beijing | 20 |
TABLE 6
ID | City | Date | Cost |
1 | Beijing | 2017 | 10 |
1 | Beijing | 2018 | 20 |
1 | Shanghai | 2017 | 10 |
2 | Beijing | 2017 | 15 |
2 | Beijing | 2018 | 20 |
2 | Shanghai | 2017 | 20 |
The filtering conditions in the query request include, for example: "ID" =1, "Date" =2017. For the aggregate table shown in table 5, searching is performed in order from left to right, for each column searched, it is determined whether the dimension item corresponding to the column matches each specified dimension item in the filtering condition, and the weight of the column is determined according to the matching result. After the search is completed, the scores of the aggregation table shown in table 5 are calculated using the weights of the respective columns. Similarly, for the aggregate table shown in table 6, searching is performed in order from left to right, for each column searched, it is determined whether the dimension item corresponding to the column matches each specified dimension item in the filter condition, and the weight of the column is determined according to the matching result. After the search is completed, the scores of the aggregation table shown in table 6 are calculated using the weights of the respective columns.
Illustratively, the manner of calculating the score for each aggregate table described above is as follows:
For column: the schema// traversing the table structure from left to right by columns, for each column traversed, performing the following operations
Do:
If column in query_colunms:
Weight = weight < 1+w (i)// if a column is the dimension column contained in the filter term, the current score is equal to the last score shifted left by 1 bit plus the weight of that column
else:
Break// if a column is not the dimension column contained in the filter term, then the loop is popped out
Done
Where weight represents the final score of the aggregate table with an initial value of 0, coulmn represents each column in the aggregate table, and query_columns represents the columns of dimensions contained in the filtering conditions of the query request. w (i) represents a weight determined by the i-th column based on filtering matching, i is a positive integer, the value is set according to the selectivity (or matching degree), the selectivity is higher (or the matching degree is higher), and the weight is higher, for example, a weight of complete matching may be set to be 5, a weight of complete matching may be set to be 4, and the like, and the invention is not limited thereto.
Corresponding to the aggregation table of table 5, the first column of table 5 matches "ID" in the filter condition in order from left to right, determining w (1) =5, at which point weight= 0+5 =5. The search continues to the second column, which matches "Date" in the filter term, determining w (2) =5, at which point weight=10+5=15. And continuing searching until the third column is not matched with the filtering condition, and ending searching. The final score was 15. Corresponding to the aggregation table of table 6, the first column of table 6 matches "ID" in the filter condition in order from left to right, determining w (1) =5, at which point weight= 0+5 =5. And continuing searching until the second column is not matched with the filtering condition, and ending searching. The final score was 5. If the score of the aggregate table shown in table 5 is greater than the score of the aggregate table shown in table 6, the aggregate table shown in table 5 is screened as an aggregate table that meets the query request.
If there are more aggregate tables after the third screening, then reference is continued to FIG. 3.
Next, in operation S305, an aggregation table with fewer dimension columns is selected.
After operation S305, if a plurality of aggregation tables remain, operation S306 is performed.
Next, in operation S306, an aggregation table with fewer rows is selected. Where a smaller number of rows indicates a smaller number of data.
Next, in operation S307, the aggregation table obtained by the final screening is transmitted to the client.
It can be appreciated that the data processing method according to the present disclosure designs an automatic routing policy for a data warehouse at query time, the policy comprising a plurality of rules, the policy shielding a plurality of aggregated forms from users, reducing learning and use costs for users. By means of this strategy, the most efficient aggregate table that satisfies the query request can be found.
Fig. 4 schematically shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure.
As shown in fig. 4, the data processing apparatus 400 includes: acquisition module 410, aggregation module 420, receiving module 430, screening module 440, and sending module 450.
The acquisition module 410 is configured to acquire a data source.
Wherein the data source includes at least one dimension item and at least one index item.
The aggregation module 420 is configured to aggregate the data sources multiple times respectively to obtain multiple aggregation tables.
The receiving module 430 is configured to receive a query request from a client.
The filtering module 440 is configured to filter from a plurality of aggregate forms to obtain an aggregate form that meets the query request.
The sending module 450 is configured to send the aggregate table that meets the query request to the client.
It should be noted that, in the embodiment of the apparatus portion, the implementation manner, the solved technical problem, the realized function, and the achieved technical effect of each module/unit/subunit and the like are the same as or similar to the implementation manner, the solved technical problem, the realized function, and the achieved technical effect of each corresponding step in the embodiment of the method portion, and are not described herein again.
Any number of modules, sub-modules, units, sub-units, or at least some of the functionality of any number of the sub-units according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented as split into multiple modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or in any other reasonable manner of hardware or firmware that integrates or encapsulates the circuit, or in any one of or a suitable combination of three of software, hardware, and firmware. Or one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be at least partially implemented as computer program modules, which, when executed, may perform the corresponding functions.
For example, any of the acquisition module 410, the aggregation module 420, the receiving module 430, the screening module 440, and the transmitting module 450 may be combined and implemented in one module, or any of the modules may be split into a plurality of modules. Or at least some of the functionality of one or more of the modules may be combined with, and implemented in, at least some of the functionality of other modules. According to embodiments of the present disclosure, at least one of the acquisition module 410, the aggregation module 420, the reception module 430, the screening module 440, and the transmission module 450 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable way of integrating or packaging the circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware. Or at least one of the acquisition module 410, the aggregation module 420, the receiving module 430, the screening module 440, and the transmitting module 450 may be at least partially implemented as computer program modules which, when executed, may perform the corresponding functions.
Fig. 5 schematically shows a block diagram of a computer device adapted to implement the above-described method according to an embodiment of the present disclosure. The computer device illustrated in fig. 5 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present disclosure.
As shown in fig. 5, a computer device 500 according to an embodiment of the present disclosure includes a processor 501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. The processor 501 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 501 may also include on-board memory for caching purposes. The processor 501 may comprise a single processing unit or a plurality of processing units for performing different actions of the method flows according to embodiments of the disclosure.
In the RAM 503, various programs and data required for the operation of the device 500 are stored. The processor 501, ROM 502, and RAM 503 are connected to each other by a bus 504. The processor 501 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 502 and/or the RAM 503. Note that the program may be stored in one or more memories other than the ROM 502 and the RAM 503. The processor 501 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, the device 500 may further comprise an input/output (I/O) interface 505, the input/output (I/O) interface 505 also being connected to the bus 504. The device 500 may also include one or more of the following components connected to the I/O interface 505: an input section 506 including a keyboard, a mouse, and the like; an output portion 507 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The drive 510 is also connected to the I/O interface 505 as needed. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as needed so that a computer program read therefrom is mounted into the storage section 508 as needed.
According to embodiments of the present disclosure, the method flow according to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 509, and/or installed from the removable media 511. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 501. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 502 and/or RAM 503 and/or one or more memories other than ROM 502 and RAM 503 described above.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. These examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.
Claims (11)
1. A data processing method, comprising:
Acquiring a data source, wherein the data source comprises at least one dimension item and at least one index item;
Respectively carrying out multiple aggregation on the data sources to obtain multiple aggregation tables;
receiving a query request from a client, wherein the query request comprises: query version, specified dimension item, specified index item and filtering condition;
Screening from the plurality of aggregation tables to obtain an aggregation table conforming to the query request; and
Sending the aggregation table conforming to the query request to the client;
Wherein the filtering the aggregate table from the plurality of aggregate tables to obtain the aggregate table meeting the query request includes:
According to the query version, performing first screening on the aggregation tables;
Performing second screening on the aggregation table subjected to the first screening based on a query term rule according to the specified dimension item and the specified index item;
and according to the filtering conditions, third screening is carried out on the aggregation table subjected to the second screening based on filtering and ordering rules.
2. The method of claim 1, wherein aggregating the data sources a plurality of times, respectively, to obtain a plurality of aggregated tables comprises:
For any one of the multiple polymerizations,
Determining a dimension term for the any one aggregation from the at least one dimension term;
determining an index entry for the any one polymerization from the at least one index entry;
Determining an aggregation function for the any one aggregation; and
And aggregating the data sources based on the dimension items for any one aggregation, the index items for any one aggregation and the aggregation function for any one aggregation to obtain an aggregation table for any one aggregation.
3. The method of claim 1, wherein,
The data source performs data updating and version updating once every preset period, and the version of any aggregation table in the plurality of aggregation tables is the same as the version of the data source aiming at any aggregation table.
4. The method of claim 1, wherein the first screening the plurality of aggregated tables according to the query version comprises: and screening the aggregation tables based on version rules according to the query version to obtain the aggregation table subjected to the first screening.
5. The method of claim 1, wherein the second filtering the aggregate table through the first filtering based on query term rules according to the specified dimension term and the specified index term comprises: and screening the aggregation table containing the specified dimension items and the specified index items from the aggregation table subjected to the first screening to obtain the aggregation table subjected to the second screening.
6. The method of claim 1, wherein the filter criteria comprises the specified dimension item and a value of the specified dimension item;
the third filtering, based on the filtering and sorting rule, the aggregation table subjected to the second filtering according to the filtering condition includes:
For any one of the aggregate forms that have been subjected to the second screening,
Searching dimension items in any aggregation table according to a preset sequence;
Matching each searched dimension item with the appointed dimension item, if the matching is successful, determining the weight of the dimension item, continuing searching the next dimension item, and if the matching is failed, ending the search;
Determining a score for the any aggregate table based on the determined weights for the dimension items; and
The aggregate form with the highest score is selected as the aggregate form that has been subjected to the third screening.
7. The method of claim 6, wherein the screening the aggregate table from the plurality of aggregate tables to conform to the query request further comprises:
And screening the aggregation table with the minimum number of dimension items and/or the minimum number of data from the aggregation table subjected to the third screening.
8. A data processing apparatus comprising:
The acquisition module is used for acquiring a data source, wherein the data source comprises at least one dimension item and at least one index item;
the aggregation module is used for respectively carrying out multiple aggregation on the data sources to obtain a plurality of aggregation tables, wherein the dimension items of the aggregation tables are different, and/or the index items of the aggregation tables are different;
The receiving module is used for receiving a query request from the client, wherein the query request comprises: query version, specified dimension item, specified index item and filtering condition;
the screening module is used for screening the aggregation tables from the plurality of aggregation tables to obtain the aggregation tables conforming to the query request; and
The sending module is used for sending the aggregation table conforming to the query request to the client;
Wherein the filtering the aggregate table from the plurality of aggregate tables to obtain the aggregate table meeting the query request includes:
According to the query version, performing first screening on the aggregation tables;
Performing second screening on the aggregation table subjected to the first screening based on a query term rule according to the specified dimension item and the specified index item;
and according to the filtering conditions, third screening is carried out on the aggregation table subjected to the second screening based on filtering and ordering rules.
9. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing when executing the program:
The method according to any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform:
The method according to any one of claims 1 to 7.
11. A computer program product comprising a computer program comprising computer executable instructions which, when executed, are adapted to carry out the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010024591.1A CN113094444B (en) | 2020-01-09 | 2020-01-09 | Data processing method, data processing device, computer equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010024591.1A CN113094444B (en) | 2020-01-09 | 2020-01-09 | Data processing method, data processing device, computer equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113094444A CN113094444A (en) | 2021-07-09 |
CN113094444B true CN113094444B (en) | 2024-10-18 |
Family
ID=76663568
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010024591.1A Active CN113094444B (en) | 2020-01-09 | 2020-01-09 | Data processing method, data processing device, computer equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113094444B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113448986B (en) * | 2021-09-01 | 2022-03-01 | 阿里云计算有限公司 | Query method, query device, storage medium and program product |
CN115563103B (en) * | 2022-09-15 | 2023-12-08 | 河南星环众志信息科技有限公司 | Multi-dimensional aggregation method, system, electronic equipment and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109213829A (en) * | 2017-06-30 | 2019-01-15 | 北京国双科技有限公司 | Data query method and device |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103034633B (en) * | 2011-09-30 | 2016-08-03 | 国际商业机器公司 | Generate the method and device of the result of page searching summary of extension |
US9529830B1 (en) * | 2016-01-28 | 2016-12-27 | International Business Machines Corporation | Data matching for column-oriented data tables |
CN110309358A (en) * | 2018-03-27 | 2019-10-08 | 京东方科技集团股份有限公司 | A kind of resource recommendation method and system |
CN110287213B (en) * | 2019-07-03 | 2023-02-17 | 中通智新(武汉)技术研发有限公司 | Data query method, device and system based on OLAP system |
-
2020
- 2020-01-09 CN CN202010024591.1A patent/CN113094444B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109213829A (en) * | 2017-06-30 | 2019-01-15 | 北京国双科技有限公司 | Data query method and device |
Also Published As
Publication number | Publication date |
---|---|
CN113094444A (en) | 2021-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112800095B (en) | Data processing method, device, equipment and storage medium | |
CN113986933A (en) | Materialized view creating method and device, storage medium and electronic equipment | |
CN109255586B (en) | Online personalized recommendation method for e-government affairs handling | |
CN102667761A (en) | Scalable cluster database | |
CN106844407B (en) | Tag network generation method and system based on data set correlation | |
CN110990447B (en) | Data exploration method, device, equipment and storage medium | |
US20140006166A1 (en) | System and method for determining offers based on predictions of user interest | |
CN111367965B (en) | Target object determining method, device, electronic equipment and storage medium | |
CN111489201A (en) | Method, device and storage medium for analyzing customer value | |
CN103324701A (en) | Data searching device and method | |
CN113094444B (en) | Data processing method, data processing device, computer equipment and medium | |
CN110750555A (en) | Method, apparatus, computing device, and medium for generating index | |
CN113312410A (en) | Data map construction method, data query method and terminal equipment | |
CN111159213A (en) | Data query method, device, system and storage medium | |
CN109885651A (en) | A kind of question pushing method and device | |
CN113569162A (en) | Data processing method, device, equipment and storage medium | |
CN104636422B (en) | The method and system for the pattern concentrated for mining data | |
CN110765100B (en) | Label generation method and device, computer readable storage medium and server | |
CN111125158B (en) | Data table processing method, device, medium and electronic equipment | |
CN113379551A (en) | Transaction data analysis method and device and electronic equipment | |
CN115168474B (en) | Internet of things central station system building method based on big data model | |
CN112950239A (en) | Method, apparatus, device and computer readable medium for generating user information | |
CN117291722A (en) | Object management method, related device and computer readable medium | |
CN115481026A (en) | Test case generation method and device, computer equipment and storage medium | |
Bhutani et al. | WSEMQT: a novel approach for quality‐based evaluation of web data sources for a data warehouse |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |