Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an online index selection method and device based on a neural network and an attenuation strategy, so that higher accuracy can be realized when the execution cost of the query is calculated.
In order to achieve the above purpose, the invention provides an online index selection method based on a neural network and an attenuation strategy, which specifically comprises the following steps:
training the tree convolution neural network to enable the tree convolution neural network to output corresponding execution cost based on the query execution plan;
constructing a hash table, wherein keys of the constructed hash table are columns capable of establishing indexes in a database, and values are arrays, and the arrays are used for storing benefits of indexes on corresponding columns to past queries;
adopting an attenuation strategy, and obtaining a new value of the value at each position of each array in the hash table based on the attenuation parameters;
Disabling the existing indexes in the database, and obtaining default execution cost of the new query under default index configuration based on the tree convolutional neural network;
Calculating the income of the index pair of each column in the database based on the default execution cost of the newly-arrived query, and adding the income into the tail part of the corresponding array in the hash table;
Enabling existing indexes based on index instructions in a database, sequencing the indexes according to total benefits, and selecting a preset number of indexes from high to low as new index configuration.
Based on the technical scheme, the training tree convolutional neural network is used for enabling the tree convolutional neural network to output corresponding execution cost based on the query execution plan, and the specific steps comprise:
Acquiring an executed query execution plan and corresponding execution cost in a database;
the method comprises the steps of forming a data set by an acquired query execution plan and execution cost, and dividing data in the data set into a training set and a testing set according to a preset proportion;
Training the tree convolution neural network by using the training set, judging the training result of the tree convolution neural network by using the testing set, and obtaining the trained tree convolution neural network after the tree convolution neural network converges.
On the basis of the technical scheme, the hash table is constructed, and the specific construction process is as follows:
Traversing all columns and indexes of each table in the database, and taking the column without indexes as a column capable of establishing indexes;
and assigning the index-built columns and a null array to the hash table as key value pairs one by one to realize the construction of the hash table.
Based on the technical scheme, the method adopts an attenuation strategy, obtains a new value of the value at each position of each array in the hash table based on the attenuation parameter, and specifically comprises the following steps:
Establishing a two-layer for loop, wherein the first layer for loop is used for circularly traversing each array in the hash table, and the second layer for loop is used for circularly traversing each position in the array;
multiplying the value of each position of each array in the hash table by the attenuation parameter to obtain a new value as the value of the current position.
Based on the technical scheme, the tree convolutional neural network obtains the default execution cost of the new query under the default index configuration, and the specific steps include:
Invoking a database optimizer to obtain a query execution plan for the new query;
the new query execution plan is input into the tree convolutional neural network, and the output is the execution cost of the new query under the default index configuration, namely the default execution cost.
Based on the technical scheme, the obtaining of the income of the index pair on each column in the database based on the default execution cost of the income inquiry comprises the following specific steps:
for each column appearing in the new query, establishing a virtual index on the current column, and calculating to obtain the query execution cost of the current column at the moment based on a database optimizer and a tree convolutional neural network;
Subtracting the query execution cost of the current column from the default execution cost of the new query to obtain the benefit of the index on the current column on the new query.
Based on the technical scheme, the adding of the benefit to the tail of the corresponding array in the hash table is specifically as follows:
The benefit of the new-to-query is added to the tail of the corresponding array in the hash table and the virtual index is deleted, and for columns that do not appear in the new-to-query, 0 is added to the tail of the corresponding array in the hash table.
Based on the technical scheme, after adding the income from the new query to the tail of the corresponding array in the hash table, the method further comprises the following steps:
judging whether the length of the current array is larger than the preset length, if so, deleting the head element of the current array, and if not, not processing.
Based on the technical scheme, the indexes are ordered according to total benefits, and a preset number of indexes are selected from high to low to serve as new index configuration, and the specific steps include:
calculating the sum of each array in the hash table, and taking the sum as the total income of the index on the corresponding column;
And sorting the indexes according to the total income, selecting a preset number of indexes from high to low as new index configuration, and replacing the existing index configuration in the database with the new index configuration.
The invention provides an online index selection device based on a neural network and an attenuation strategy, which comprises the following components:
the training module is used for training the tree convolution neural network so that the tree convolution neural network outputs corresponding execution cost based on the query execution plan;
the construction module is used for constructing a hash table, keys of the constructed hash table are columns which can be used for establishing indexes in the database, the values are arrays, and the arrays are used for saving benefits of the indexes on the corresponding columns to the past query;
the first calculation module is used for obtaining a new value of the value at each position of each array in the hash table based on the attenuation parameters by adopting an attenuation strategy;
the disabling module is used for disabling the existing indexes in the database and obtaining default execution cost of the new query under default index configuration based on the tree convolution neural network;
the second calculation module is used for calculating the income of the index pair of each column in the database to the new query based on the default execution cost of the new query, and adding the income to the tail of the corresponding array in the hash table;
and the starting module is used for starting the existing indexes based on the index instructions in the database, sequencing the indexes according to the total income, and selecting a preset number of indexes from high to low as new index configuration.
Compared with the prior art, the invention has the advantages that:
(1) The invention calculates the execution cost of the query by using the tree convolution neural network, and the network uses the detector capable of identifying the specific tree structure and can capture the specific tree structure existing in the query plan, thereby having higher accuracy when calculating the execution cost of the query;
(2) The method uses the attenuation strategy to calculate the total gain of the index, and the strategy ensures that the more recent queries occupy larger weight when calculating the total gain of the index, and can finally obtain more reasonable total gain of the index;
(3) Compared with the existing index selection method, the method provided by the invention can be well performed in a dynamic environment with continuous query arrival.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application.
The present invention uses a tree convolutional neural network to estimate the execution cost of a query. Tree convolutional neural networks were developed from conventional convolutional neural networks that use probes that identify a particular tree structure. Because the execution plan generated by the database optimizer has a tree-like structure, the tree convolutional neural network can capture some specific structures in the tree-like structure, and obtain more accurate numerical values than the execution cost contained in the execution plan. In a dynamic environment where queries arrive continuously, the longer the queries are, the less relevant the queries are to the latest queries, and the weight of old queries in calculating index returns needs to be reduced. Therefore, the invention designs an attenuation strategy for calculating the total income of the index in the dynamic environment, which ensures that the more recent inquiry has larger influence on the index income and can more effectively adjust the existing index configuration.
Referring to fig. 1, the method for selecting an online index based on a neural network and an attenuation strategy according to the embodiment of the present invention specifically includes the following steps:
s1, training a tree convolutional neural network to enable the tree convolutional neural network to output corresponding execution cost, namely execution time, based on a query execution plan;
That is, the execution time of the queries executed by the database is collected, the queries are input into the database optimizer, the execution plans of the queries can be obtained, and the tree convolutional neural network is trained by the execution plan and the actual execution time of the queries. After being trained, the tree convolution neural network can output an accurate estimated execution cost by taking the execution plan of the query as input. In the following steps, the tree convolutional neural network will replace the database optimizer to calculate the execution cost of the query.
In the invention, training the tree convolution neural network to make the tree convolution neural network output corresponding execution cost based on the query execution plan, specifically comprising the following steps:
S101, acquiring an executed query execution plan and corresponding execution cost in a database;
S102, forming a data set by the acquired query execution plan and execution cost, and dividing the data in the data set into a training set and a testing set according to a preset proportion;
And S103, training the tree convolutional neural network by using a training set, judging the training result of the tree convolutional neural network by using a testing set, and obtaining the trained tree convolutional neural network after the tree convolutional neural network converges.
In the practical application process, the programming language is Python, and the database is PostgreSQL. By searching the log file of the database, the executed queries and the corresponding execution costs in the database can be obtained, the execution plans of the queries can be obtained through EXPLAIN commands, the query execution plans and the corresponding execution costs form a data set, the data in the data set are divided into a training set and a testing set according to the ratio of 9:1, the training set is used for training the tree convolutional neural network, the training result of the tree convolutional neural network is judged by adopting the testing set, and the training can be stopped after the tree convolutional neural network is observed to be converged.
S2, constructing a hash table, wherein keys of the constructed hash table are columns which can be used for establishing indexes in a database, and the values are arrays which are used for storing benefits of indexes on the corresponding columns to the past query;
that is, a preparation work before query processing is performed, an empty hash table is created, the keys of the hash table are each index-built column in the database, and the value is an array. For example, the maximum value of each array length is w, i.e., the index on the corresponding column is reserved in the array at most for the benefit of the past w queries.
In the invention, a hash table is constructed, and the specific construction process is as follows:
s201, traversing all columns and indexes of each table in a database, and taking the columns without indexes as columns capable of establishing indexes;
s202, assigning the index-built columns and a null array to the hash table one by one as key value pairs to realize the construction of the hash table.
In the actual application process, using an \d+ command to check all columns and index conditions of each table in the database one by one, wherein the column without the index is used as an index-establishing column, in Python, a hash table is realized by using a data structure of a direct, and the index-establishing column and a null array are used as key value pairs one by one to be assigned to the hash table, so that the hash table is constructed.
S3, adopting an attenuation strategy, and obtaining a new value of the value at each position of each array in the hash table based on attenuation parameters;
In the invention, a new value of the value at each position of each array in the hash table is obtained based on attenuation parameters by adopting an attenuation strategy, and the specific steps comprise:
s301, establishing a two-layer for loop, wherein the first layer for loop is used for traversing each array in the hash table in a loop mode, and the second layer for loop is used for traversing each position in the array in a loop mode;
S302, multiplying the value of each position of each array in the hash table by the attenuation parameter to obtain a new value as the value of the current position.
The value of each position of each array in the hash table is multiplied by the attenuation parameter as a new value, the attenuation parameter is between 0 and 1, and the value of the attenuation parameter can be continuously adjusted according to the actual environment and the effect of the method in actual application.
S4, disabling the existing indexes in the database, and obtaining default execution cost of the new query under default index configuration based on the tree convolutional neural network;
Since there are previously recommended index configurations in the database that have an impact on evaluating the execution cost of the query, the instructions in the database need to be invoked first to disable the existing index. Specifically, the index in the database may be disabled using the update pg_index SET INDISVALID =false instruction.
In the invention, the default execution cost of the new query under the default index configuration is obtained based on the tree convolution neural network, and the specific steps comprise:
S401, calling a database optimizer to obtain a query execution plan of the new query;
S402, inputting a query execution plan of the new query into the tree convolutional neural network, and outputting the new query to be the execution cost of the new query under the default index configuration, namely the default execution cost.
And calling the database optimizer by using EXPLAIN commands to obtain an execution plan of the new-to-query, taking the execution plan as the input of the tree convolutional neural network, and finally obtaining the execution cost of the new-to-query under the default index configuration.
S5, calculating the income of the index pair of each column in the database to the new query based on the default execution cost of the new query, and adding the income to the tail of the corresponding array in the hash table;
In the invention, based on default execution cost of the new query, the income of indexes on each column in the database to the new query is calculated, and the specific steps include:
s501, for each column appearing in the new query, establishing a virtual index on the current column, and calculating to obtain the query execution cost of the current column at the moment based on a database optimizer and a tree convolution neural network;
s502, subtracting the query execution cost of the current column from the default execution cost of the new query to obtain the benefit of the index on the current column to the new query.
In the actual application process, for each column appearing in the new query, a HypoPG (https:// gitsub. Com/HypoPG/hypopg) plug-in is used to build a virtual index on the column, then according to step S4, the execution cost of the latest query in the presence of the index is obtained, and then the value obtained in step S4 is subtracted by the value obtained in the step, so that the benefit of the index to the new query is obtained.
In the invention, the benefit is added to the tail part of the corresponding array in the hash table, and the method specifically comprises the following steps:
The benefit of the new-to-query is added to the tail of the corresponding array in the hash table and the virtual index is deleted, and for columns that do not appear in the new-to-query, 0 is added to the tail of the corresponding array in the hash table.
The benefits to the query may be added to the tail of the corresponding array in the hash table using the application method of the list in Python.
In one possible implementation, after adding the benefit of the new query to the tail of the corresponding array in the hash table, the method further includes:
judging whether the length of the current array is larger than the preset length, if so, deleting the head element of the current array, and if not, not processing.
Since a new element is added to each array of the hash table, the length of the current array may be greater than w, and thus, if the length of the current array is greater than w, the head element of the array is deleted. The head element is deleted and the other elements are not deleted because the head element is the benefit of the index to the oldest query.
The length of the array can be obtained using the len method, and if the length is greater than w, the head element of the array is deleted using the del statement.
And S6, starting the existing indexes based on the index instruction in the database, sequencing the indexes according to the total income, and selecting the indexes with preset numbers from high to low as new index configuration.
Since the benefit of the index on each column to the new query has now been calculated, the instructions in the database can be invoked to enable the existing index. Specifically, the database existing index may be enabled using the update pg_index SET INDISVALID =true instruction.
In the invention, indexes are ordered according to total income, and a preset number of indexes are selected from high to low to be used as new index configuration, and the specific steps include:
S601, calculating the sum of each array in the hash table, and taking the sum as the total income of the index on the corresponding column;
S602, sorting indexes according to total benefits, selecting a preset number of indexes from high to low as new index configuration, and replacing the existing index configuration in the database with the new index configuration.
In the actual application process, sum statement is used for calculating the sum of each array in the hash table, then the sort statement is used for sorting the arrays, and then indexes with preset numbers are selected from high to low to be used as new index configuration. The new index configuration is compared to the old index configuration, and the existing index configuration is changed using the CREATE statement and DELETE statement of the database, thereby transferring the index configuration in the database to this new index configuration.
When a new query needs to be processed, the steps S3 to S6 are repeated.
The method for selecting the index on line based on the neural network and the attenuation strategy comprehensively considers the collected data, the construction of the training tree convolutional neural network and the calculation of the index gain by using the attenuation strategy, the recommendation of new index configuration and the modification of the index configuration in the whole process.
In the invention, the tree convolution neural network is provided with the detector capable of identifying the structural mode, and can observe the special structure in the query execution plan, so that the method has higher accuracy compared with a method for calculating the query cost by using a database optimizer.
The method comprises the steps of setting a manually-set attenuation parameter, wherein the value of the attenuation parameter is between 0 and 1, multiplying the gain of an index obtained from the past query by the attenuation parameter when calculating the total gain of the index, and multiplying the attenuation parameter by the older query more times, so that the influence of the newer query on the total gain of the index is ensured to be larger, and the change trend of the future query can be captured.
The method comprises the steps of collecting data from log files of a database to train a tree convolutional neural network, using a hash table to save index benefits of past queries and provide a data structure required by an attenuation strategy, disabling existing index configuration in the database before calculating the index benefits so as to avoid influence on calculating the index benefits, and enabling the index configuration after calculating.
For the online index selection method based on the neural network and the attenuation strategy, firstly, the invention provides a method for calculating the query execution cost by using a tree convolution neural network. Therefore, the method can capture some specific tree structures in the query plan, and finally has higher accuracy when calculating the execution cost of the query; the invention designs a method for calculating the total gain of indexes under dynamic environment, namely the dynamic environment is continuously provided with an environment where queries arrive, and in the environment, the queries arriving close in time are similar, so that the weight of old queries in calculating index gain needs to be reduced, and the requirement can be met by the attenuation strategy.
In a possible implementation manner, the embodiment of the present invention further provides a readable storage medium, where the readable storage medium is located in a PLC (Programmable Logic Controller ) controller, and a computer program is stored on the readable storage medium, where the program is executed by a processor to implement the following steps of the online index selection method based on a neural network and an attenuation policy:
training the tree convolution neural network to enable the tree convolution neural network to output corresponding execution cost based on the query execution plan;
constructing a hash table, wherein keys of the constructed hash table are columns capable of establishing indexes in a database, and values are arrays, and the arrays are used for storing benefits of indexes on corresponding columns to past queries;
adopting an attenuation strategy, and obtaining a new value of the value at each position of each array in the hash table based on the attenuation parameters;
Disabling the existing indexes in the database, and obtaining default execution cost of the new query under default index configuration based on the tree convolutional neural network;
Calculating the income of the index pair of each column in the database based on the default execution cost of the newly-arrived query, and adding the income into the tail part of the corresponding array in the hash table;
Enabling existing indexes based on index instructions in a database, sequencing the indexes according to total benefits, and selecting a preset number of indexes from high to low as new index configuration.
The storage media may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
Referring to fig. 2, an online index selection device based on a neural network and an attenuation policy provided by an embodiment of the present invention includes a training module, a building module, a first computing module, a disabling module, a second computing module, and an enabling module.
The system comprises a training module, a constructing module, a first calculating module, a disabling module, a second calculating module and an enabling module, wherein the training module is used for training the tree convolutional neural network to enable the tree convolutional neural network to output corresponding execution cost based on a query execution plan, the constructing module is used for constructing a hash table, keys of the constructed hash table are used for each column capable of establishing indexes in a database, the values are arrays used for saving benefits of indexes on the corresponding columns to past queries, the first calculating module is used for obtaining new values of the values on each position of each array in the hash table based on attenuation parameters by adopting an attenuation strategy, the disabling module is used for disabling existing indexes in the database and obtaining default execution cost of the new query under default index configuration based on the tree convolutional neural network, the second calculating module is used for calculating and obtaining benefits of the new query for each column in the database based on default execution cost of the new query, the benefits are added to the tail of the corresponding arrays in the hash table, and the enabling module is used for enabling existing indexes based on index instructions in the database, ordering the indexes according to the total benefits, and selecting the preset number of indexes from high to low as the new index configuration.
The foregoing is only a specific embodiment of the application to enable those skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.