[go: up one dir, main page]

CN116383242B - An online index selection method and apparatus based on neural networks and decay strategies - Google Patents

An online index selection method and apparatus based on neural networks and decay strategies

Info

Publication number
CN116383242B
CN116383242B CN202310303515.8A CN202310303515A CN116383242B CN 116383242 B CN116383242 B CN 116383242B CN 202310303515 A CN202310303515 A CN 202310303515A CN 116383242 B CN116383242 B CN 116383242B
Authority
CN
China
Prior art keywords
index
new
query
hash table
array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310303515.8A
Other languages
Chinese (zh)
Other versions
CN116383242A (en
Inventor
王森洋
吴小莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202310303515.8A priority Critical patent/CN116383242B/en
Publication of CN116383242A publication Critical patent/CN116383242A/en
Application granted granted Critical
Publication of CN116383242B publication Critical patent/CN116383242B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种基于神经网络和衰减策略的在线索引选择方法及装置,涉及数据库领域,该方法包括训练树卷积神经网络,以使树卷积神经网络基于查询执行计划,输出对应的执行代价;构建哈希表;基于衰减参数得到哈希表中每个数组的每个位置上的值的新值;禁用数据库中已存在的索引,并基于树卷积神经网络得到新到查询在默认索引配置下的默认执行代价;基于新到查询的默认执行代价,计算得到数据库中每个列上的索引对新到查询的收益,并将该收益加入哈希表中对应数组的尾部;基于数据库中的索引指令启用已存在索引,并按照总计收益对索引排序,由高至低选取预设个数的索引作为新索引配置。本发明使得在计算查询执行代价时,能有更高的准确性。

This invention discloses an online index selection method and apparatus based on neural networks and a decay strategy, relating to the database field. The method includes: training a tree convolutional neural network to output the corresponding execution cost based on the query execution plan; constructing a hash table; obtaining new values for each position in each array of the hash table based on decay parameters; disabling existing indexes in the database and obtaining the default execution cost of a new query under the default index configuration based on the tree convolutional neural network; calculating the benefit of each column index on the new query for the new query based on the default execution cost of the new query, and adding this benefit to the tail of the corresponding array in the hash table; enabling existing indexes based on index instructions in the database, sorting the indexes according to their total benefit, and selecting a preset number of indexes from high to low as the new index configuration. This invention achieves higher accuracy in calculating query execution costs.

Description

On-line index selection method and device based on neural network and attenuation strategy
Technical Field
The invention relates to the field of databases, in particular to an online index selection method and device based on a neural network and an attenuation strategy.
Background
In recent years, with the entry into the large data age, database systems have become more and more complex, and the amount of data therein has become larger and larger. The database in operation is to process the queries sent by the users continuously, however, when there is a large amount of data in the database, it is not easy to ensure that the database can have a faster query processing speed.
In general, in order to reduce the execution time of a query and increase the running speed of a database, an appropriate index configuration may be selected for the database. An index is a physical structure built on a column in a database, and occupies a certain storage space, but can improve the execution efficiency of queries in the database. The index configuration is a set of indexes. However, it is difficult for database administrators and users to choose a better index configuration because in a larger database, there are typically hundreds of tables, thousands of columns, and because of memory space limitations, only a small fraction of columns can be selected to build an index, and it is difficult to select the appropriate columns among so many columns to build an index. And because new queries are continually sent to the database, database administrators need to continually adjust the index configuration for these new queries, it is more cumbersome to select a valid index configuration in such a dynamic environment.
Index selection problems have been a hotspot problem in the database field, and many schemes have been used to deal with this problem in recent decades, most of which use a database optimizer to generate an execution plan for a query, and use the estimated execution cost of the query in the execution plan to calculate the return of the index. The database optimizer is not able to accurately estimate the execution cost of the query, which results in inaccurate estimates of index returns. Moreover, in the existing index selection methods, most of the methods are used for solving the problem of offline index selection. In the off-line index selection problem, the environment in which the method faces is static, i.e., the method needs to optimize a fixed set of queries for which to select the appropriate index configuration. However, as mentioned above, in practice, the offline index selection method cannot be used in a real-world environment because the queries are constantly coming and there is no fixed set of queries. Therefore, how to design an online index selection method capable of accurately calculating index benefits and continuously optimizing new queries is a current urgent problem to be solved.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an online index selection method and device based on a neural network and an attenuation strategy, so that higher accuracy can be realized when the execution cost of the query is calculated.
In order to achieve the above purpose, the invention provides an online index selection method based on a neural network and an attenuation strategy, which specifically comprises the following steps:
training the tree convolution neural network to enable the tree convolution neural network to output corresponding execution cost based on the query execution plan;
constructing a hash table, wherein keys of the constructed hash table are columns capable of establishing indexes in a database, and values are arrays, and the arrays are used for storing benefits of indexes on corresponding columns to past queries;
adopting an attenuation strategy, and obtaining a new value of the value at each position of each array in the hash table based on the attenuation parameters;
Disabling the existing indexes in the database, and obtaining default execution cost of the new query under default index configuration based on the tree convolutional neural network;
Calculating the income of the index pair of each column in the database based on the default execution cost of the newly-arrived query, and adding the income into the tail part of the corresponding array in the hash table;
Enabling existing indexes based on index instructions in a database, sequencing the indexes according to total benefits, and selecting a preset number of indexes from high to low as new index configuration.
Based on the technical scheme, the training tree convolutional neural network is used for enabling the tree convolutional neural network to output corresponding execution cost based on the query execution plan, and the specific steps comprise:
Acquiring an executed query execution plan and corresponding execution cost in a database;
the method comprises the steps of forming a data set by an acquired query execution plan and execution cost, and dividing data in the data set into a training set and a testing set according to a preset proportion;
Training the tree convolution neural network by using the training set, judging the training result of the tree convolution neural network by using the testing set, and obtaining the trained tree convolution neural network after the tree convolution neural network converges.
On the basis of the technical scheme, the hash table is constructed, and the specific construction process is as follows:
Traversing all columns and indexes of each table in the database, and taking the column without indexes as a column capable of establishing indexes;
and assigning the index-built columns and a null array to the hash table as key value pairs one by one to realize the construction of the hash table.
Based on the technical scheme, the method adopts an attenuation strategy, obtains a new value of the value at each position of each array in the hash table based on the attenuation parameter, and specifically comprises the following steps:
Establishing a two-layer for loop, wherein the first layer for loop is used for circularly traversing each array in the hash table, and the second layer for loop is used for circularly traversing each position in the array;
multiplying the value of each position of each array in the hash table by the attenuation parameter to obtain a new value as the value of the current position.
Based on the technical scheme, the tree convolutional neural network obtains the default execution cost of the new query under the default index configuration, and the specific steps include:
Invoking a database optimizer to obtain a query execution plan for the new query;
the new query execution plan is input into the tree convolutional neural network, and the output is the execution cost of the new query under the default index configuration, namely the default execution cost.
Based on the technical scheme, the obtaining of the income of the index pair on each column in the database based on the default execution cost of the income inquiry comprises the following specific steps:
for each column appearing in the new query, establishing a virtual index on the current column, and calculating to obtain the query execution cost of the current column at the moment based on a database optimizer and a tree convolutional neural network;
Subtracting the query execution cost of the current column from the default execution cost of the new query to obtain the benefit of the index on the current column on the new query.
Based on the technical scheme, the adding of the benefit to the tail of the corresponding array in the hash table is specifically as follows:
The benefit of the new-to-query is added to the tail of the corresponding array in the hash table and the virtual index is deleted, and for columns that do not appear in the new-to-query, 0 is added to the tail of the corresponding array in the hash table.
Based on the technical scheme, after adding the income from the new query to the tail of the corresponding array in the hash table, the method further comprises the following steps:
judging whether the length of the current array is larger than the preset length, if so, deleting the head element of the current array, and if not, not processing.
Based on the technical scheme, the indexes are ordered according to total benefits, and a preset number of indexes are selected from high to low to serve as new index configuration, and the specific steps include:
calculating the sum of each array in the hash table, and taking the sum as the total income of the index on the corresponding column;
And sorting the indexes according to the total income, selecting a preset number of indexes from high to low as new index configuration, and replacing the existing index configuration in the database with the new index configuration.
The invention provides an online index selection device based on a neural network and an attenuation strategy, which comprises the following components:
the training module is used for training the tree convolution neural network so that the tree convolution neural network outputs corresponding execution cost based on the query execution plan;
the construction module is used for constructing a hash table, keys of the constructed hash table are columns which can be used for establishing indexes in the database, the values are arrays, and the arrays are used for saving benefits of the indexes on the corresponding columns to the past query;
the first calculation module is used for obtaining a new value of the value at each position of each array in the hash table based on the attenuation parameters by adopting an attenuation strategy;
the disabling module is used for disabling the existing indexes in the database and obtaining default execution cost of the new query under default index configuration based on the tree convolution neural network;
the second calculation module is used for calculating the income of the index pair of each column in the database to the new query based on the default execution cost of the new query, and adding the income to the tail of the corresponding array in the hash table;
and the starting module is used for starting the existing indexes based on the index instructions in the database, sequencing the indexes according to the total income, and selecting a preset number of indexes from high to low as new index configuration.
Compared with the prior art, the invention has the advantages that:
(1) The invention calculates the execution cost of the query by using the tree convolution neural network, and the network uses the detector capable of identifying the specific tree structure and can capture the specific tree structure existing in the query plan, thereby having higher accuracy when calculating the execution cost of the query;
(2) The method uses the attenuation strategy to calculate the total gain of the index, and the strategy ensures that the more recent queries occupy larger weight when calculating the total gain of the index, and can finally obtain more reasonable total gain of the index;
(3) Compared with the existing index selection method, the method provided by the invention can be well performed in a dynamic environment with continuous query arrival.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an online index selection method based on a neural network and an attenuation strategy in an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an online index selection device based on a neural network and an attenuation strategy in an embodiment of the invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application.
The present invention uses a tree convolutional neural network to estimate the execution cost of a query. Tree convolutional neural networks were developed from conventional convolutional neural networks that use probes that identify a particular tree structure. Because the execution plan generated by the database optimizer has a tree-like structure, the tree convolutional neural network can capture some specific structures in the tree-like structure, and obtain more accurate numerical values than the execution cost contained in the execution plan. In a dynamic environment where queries arrive continuously, the longer the queries are, the less relevant the queries are to the latest queries, and the weight of old queries in calculating index returns needs to be reduced. Therefore, the invention designs an attenuation strategy for calculating the total income of the index in the dynamic environment, which ensures that the more recent inquiry has larger influence on the index income and can more effectively adjust the existing index configuration.
Referring to fig. 1, the method for selecting an online index based on a neural network and an attenuation strategy according to the embodiment of the present invention specifically includes the following steps:
s1, training a tree convolutional neural network to enable the tree convolutional neural network to output corresponding execution cost, namely execution time, based on a query execution plan;
That is, the execution time of the queries executed by the database is collected, the queries are input into the database optimizer, the execution plans of the queries can be obtained, and the tree convolutional neural network is trained by the execution plan and the actual execution time of the queries. After being trained, the tree convolution neural network can output an accurate estimated execution cost by taking the execution plan of the query as input. In the following steps, the tree convolutional neural network will replace the database optimizer to calculate the execution cost of the query.
In the invention, training the tree convolution neural network to make the tree convolution neural network output corresponding execution cost based on the query execution plan, specifically comprising the following steps:
S101, acquiring an executed query execution plan and corresponding execution cost in a database;
S102, forming a data set by the acquired query execution plan and execution cost, and dividing the data in the data set into a training set and a testing set according to a preset proportion;
And S103, training the tree convolutional neural network by using a training set, judging the training result of the tree convolutional neural network by using a testing set, and obtaining the trained tree convolutional neural network after the tree convolutional neural network converges.
In the practical application process, the programming language is Python, and the database is PostgreSQL. By searching the log file of the database, the executed queries and the corresponding execution costs in the database can be obtained, the execution plans of the queries can be obtained through EXPLAIN commands, the query execution plans and the corresponding execution costs form a data set, the data in the data set are divided into a training set and a testing set according to the ratio of 9:1, the training set is used for training the tree convolutional neural network, the training result of the tree convolutional neural network is judged by adopting the testing set, and the training can be stopped after the tree convolutional neural network is observed to be converged.
S2, constructing a hash table, wherein keys of the constructed hash table are columns which can be used for establishing indexes in a database, and the values are arrays which are used for storing benefits of indexes on the corresponding columns to the past query;
that is, a preparation work before query processing is performed, an empty hash table is created, the keys of the hash table are each index-built column in the database, and the value is an array. For example, the maximum value of each array length is w, i.e., the index on the corresponding column is reserved in the array at most for the benefit of the past w queries.
In the invention, a hash table is constructed, and the specific construction process is as follows:
s201, traversing all columns and indexes of each table in a database, and taking the columns without indexes as columns capable of establishing indexes;
s202, assigning the index-built columns and a null array to the hash table one by one as key value pairs to realize the construction of the hash table.
In the actual application process, using an \d+ command to check all columns and index conditions of each table in the database one by one, wherein the column without the index is used as an index-establishing column, in Python, a hash table is realized by using a data structure of a direct, and the index-establishing column and a null array are used as key value pairs one by one to be assigned to the hash table, so that the hash table is constructed.
S3, adopting an attenuation strategy, and obtaining a new value of the value at each position of each array in the hash table based on attenuation parameters;
In the invention, a new value of the value at each position of each array in the hash table is obtained based on attenuation parameters by adopting an attenuation strategy, and the specific steps comprise:
s301, establishing a two-layer for loop, wherein the first layer for loop is used for traversing each array in the hash table in a loop mode, and the second layer for loop is used for traversing each position in the array in a loop mode;
S302, multiplying the value of each position of each array in the hash table by the attenuation parameter to obtain a new value as the value of the current position.
The value of each position of each array in the hash table is multiplied by the attenuation parameter as a new value, the attenuation parameter is between 0 and 1, and the value of the attenuation parameter can be continuously adjusted according to the actual environment and the effect of the method in actual application.
S4, disabling the existing indexes in the database, and obtaining default execution cost of the new query under default index configuration based on the tree convolutional neural network;
Since there are previously recommended index configurations in the database that have an impact on evaluating the execution cost of the query, the instructions in the database need to be invoked first to disable the existing index. Specifically, the index in the database may be disabled using the update pg_index SET INDISVALID =false instruction.
In the invention, the default execution cost of the new query under the default index configuration is obtained based on the tree convolution neural network, and the specific steps comprise:
S401, calling a database optimizer to obtain a query execution plan of the new query;
S402, inputting a query execution plan of the new query into the tree convolutional neural network, and outputting the new query to be the execution cost of the new query under the default index configuration, namely the default execution cost.
And calling the database optimizer by using EXPLAIN commands to obtain an execution plan of the new-to-query, taking the execution plan as the input of the tree convolutional neural network, and finally obtaining the execution cost of the new-to-query under the default index configuration.
S5, calculating the income of the index pair of each column in the database to the new query based on the default execution cost of the new query, and adding the income to the tail of the corresponding array in the hash table;
In the invention, based on default execution cost of the new query, the income of indexes on each column in the database to the new query is calculated, and the specific steps include:
s501, for each column appearing in the new query, establishing a virtual index on the current column, and calculating to obtain the query execution cost of the current column at the moment based on a database optimizer and a tree convolution neural network;
s502, subtracting the query execution cost of the current column from the default execution cost of the new query to obtain the benefit of the index on the current column to the new query.
In the actual application process, for each column appearing in the new query, a HypoPG (https:// gitsub. Com/HypoPG/hypopg) plug-in is used to build a virtual index on the column, then according to step S4, the execution cost of the latest query in the presence of the index is obtained, and then the value obtained in step S4 is subtracted by the value obtained in the step, so that the benefit of the index to the new query is obtained.
In the invention, the benefit is added to the tail part of the corresponding array in the hash table, and the method specifically comprises the following steps:
The benefit of the new-to-query is added to the tail of the corresponding array in the hash table and the virtual index is deleted, and for columns that do not appear in the new-to-query, 0 is added to the tail of the corresponding array in the hash table.
The benefits to the query may be added to the tail of the corresponding array in the hash table using the application method of the list in Python.
In one possible implementation, after adding the benefit of the new query to the tail of the corresponding array in the hash table, the method further includes:
judging whether the length of the current array is larger than the preset length, if so, deleting the head element of the current array, and if not, not processing.
Since a new element is added to each array of the hash table, the length of the current array may be greater than w, and thus, if the length of the current array is greater than w, the head element of the array is deleted. The head element is deleted and the other elements are not deleted because the head element is the benefit of the index to the oldest query.
The length of the array can be obtained using the len method, and if the length is greater than w, the head element of the array is deleted using the del statement.
And S6, starting the existing indexes based on the index instruction in the database, sequencing the indexes according to the total income, and selecting the indexes with preset numbers from high to low as new index configuration.
Since the benefit of the index on each column to the new query has now been calculated, the instructions in the database can be invoked to enable the existing index. Specifically, the database existing index may be enabled using the update pg_index SET INDISVALID =true instruction.
In the invention, indexes are ordered according to total income, and a preset number of indexes are selected from high to low to be used as new index configuration, and the specific steps include:
S601, calculating the sum of each array in the hash table, and taking the sum as the total income of the index on the corresponding column;
S602, sorting indexes according to total benefits, selecting a preset number of indexes from high to low as new index configuration, and replacing the existing index configuration in the database with the new index configuration.
In the actual application process, sum statement is used for calculating the sum of each array in the hash table, then the sort statement is used for sorting the arrays, and then indexes with preset numbers are selected from high to low to be used as new index configuration. The new index configuration is compared to the old index configuration, and the existing index configuration is changed using the CREATE statement and DELETE statement of the database, thereby transferring the index configuration in the database to this new index configuration.
When a new query needs to be processed, the steps S3 to S6 are repeated.
The method for selecting the index on line based on the neural network and the attenuation strategy comprehensively considers the collected data, the construction of the training tree convolutional neural network and the calculation of the index gain by using the attenuation strategy, the recommendation of new index configuration and the modification of the index configuration in the whole process.
In the invention, the tree convolution neural network is provided with the detector capable of identifying the structural mode, and can observe the special structure in the query execution plan, so that the method has higher accuracy compared with a method for calculating the query cost by using a database optimizer.
The method comprises the steps of setting a manually-set attenuation parameter, wherein the value of the attenuation parameter is between 0 and 1, multiplying the gain of an index obtained from the past query by the attenuation parameter when calculating the total gain of the index, and multiplying the attenuation parameter by the older query more times, so that the influence of the newer query on the total gain of the index is ensured to be larger, and the change trend of the future query can be captured.
The method comprises the steps of collecting data from log files of a database to train a tree convolutional neural network, using a hash table to save index benefits of past queries and provide a data structure required by an attenuation strategy, disabling existing index configuration in the database before calculating the index benefits so as to avoid influence on calculating the index benefits, and enabling the index configuration after calculating.
For the online index selection method based on the neural network and the attenuation strategy, firstly, the invention provides a method for calculating the query execution cost by using a tree convolution neural network. Therefore, the method can capture some specific tree structures in the query plan, and finally has higher accuracy when calculating the execution cost of the query; the invention designs a method for calculating the total gain of indexes under dynamic environment, namely the dynamic environment is continuously provided with an environment where queries arrive, and in the environment, the queries arriving close in time are similar, so that the weight of old queries in calculating index gain needs to be reduced, and the requirement can be met by the attenuation strategy.
In a possible implementation manner, the embodiment of the present invention further provides a readable storage medium, where the readable storage medium is located in a PLC (Programmable Logic Controller ) controller, and a computer program is stored on the readable storage medium, where the program is executed by a processor to implement the following steps of the online index selection method based on a neural network and an attenuation policy:
training the tree convolution neural network to enable the tree convolution neural network to output corresponding execution cost based on the query execution plan;
constructing a hash table, wherein keys of the constructed hash table are columns capable of establishing indexes in a database, and values are arrays, and the arrays are used for storing benefits of indexes on corresponding columns to past queries;
adopting an attenuation strategy, and obtaining a new value of the value at each position of each array in the hash table based on the attenuation parameters;
Disabling the existing indexes in the database, and obtaining default execution cost of the new query under default index configuration based on the tree convolutional neural network;
Calculating the income of the index pair of each column in the database based on the default execution cost of the newly-arrived query, and adding the income into the tail part of the corresponding array in the hash table;
Enabling existing indexes based on index instructions in a database, sequencing the indexes according to total benefits, and selecting a preset number of indexes from high to low as new index configuration.
The storage media may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
Referring to fig. 2, an online index selection device based on a neural network and an attenuation policy provided by an embodiment of the present invention includes a training module, a building module, a first computing module, a disabling module, a second computing module, and an enabling module.
The system comprises a training module, a constructing module, a first calculating module, a disabling module, a second calculating module and an enabling module, wherein the training module is used for training the tree convolutional neural network to enable the tree convolutional neural network to output corresponding execution cost based on a query execution plan, the constructing module is used for constructing a hash table, keys of the constructed hash table are used for each column capable of establishing indexes in a database, the values are arrays used for saving benefits of indexes on the corresponding columns to past queries, the first calculating module is used for obtaining new values of the values on each position of each array in the hash table based on attenuation parameters by adopting an attenuation strategy, the disabling module is used for disabling existing indexes in the database and obtaining default execution cost of the new query under default index configuration based on the tree convolutional neural network, the second calculating module is used for calculating and obtaining benefits of the new query for each column in the database based on default execution cost of the new query, the benefits are added to the tail of the corresponding arrays in the hash table, and the enabling module is used for enabling existing indexes based on index instructions in the database, ordering the indexes according to the total benefits, and selecting the preset number of indexes from high to low as the new index configuration.
The foregoing is only a specific embodiment of the application to enable those skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims (9)

1.一种基于神经网络和衰减策略的在线索引选择方法,其特征在于,具体包括以下步骤:1. An online index selection method based on neural networks and decay strategies, characterized by comprising the following steps: 训练树卷积神经网络,以使树卷积神经网络基于查询执行计划,输出对应的执行代价;Train a tree convolutional neural network so that it outputs the corresponding execution cost based on the query execution plan; 构建哈希表,且构建的哈希表的键为数据库中每个可建立索引的列,值为数组,所述数组用以保存对应列上的索引对过去查询的收益;Construct a hash table where the key is each indexable column in the database and the value is an array used to store the benefits of indexes on the corresponding columns for past queries. 采用衰减策略,基于衰减参数得到哈希表中每个数组的每个位置上的值的新值;A decay strategy is employed to obtain the new value for each position in each array of the hash table based on the decay parameter; 禁用数据库中已存在的索引,并基于树卷积神经网络得到新到查询在默认索引配置下的默认执行代价;Disable existing indexes in the database and obtain the default execution cost of new queries under the default index configuration based on a tree convolutional neural network; 基于新到查询的默认执行代价,计算得到数据库中每个列上的索引对新到查询的收益,并将该收益加入哈希表中对应数组的尾部;Based on the default execution cost of new queries, the benefit of indexes on each column in the database for new queries is calculated, and this benefit is added to the tail of the corresponding array in the hash table; 基于数据库中的索引指令启用已存在索引,并按照总计收益对索引排序,由高至低选取预设个数的索引作为新索引配置;The existing indexes are enabled based on the index instructions in the database, and the indexes are sorted according to the total benefit, and a preset number of indexes are selected as new indexes from high to low. 其中,所述采用衰减策略,基于衰减参数得到哈希表中每个数组的每个位置上的值的新值,具体步骤包括:The step of employing a decay strategy to obtain the new value of each position in each array of the hash table based on the decay parameter includes the following specific steps: 建立两层的for循环,第一层for循环用于循环遍历哈希表中的每个数组,第二层for循环用于循环遍历数组中的每个位置;Create two nested for loops. The first for loop iterates through each array in the hash table, and the second for loop iterates through each position in the array. 将哈希表中每个数组的每个位置的值乘以衰减参数,得到的新值作为当前位置的值。Multiply the value at each position of each array in the hash table by the decay parameter, and use the new value as the value at the current position. 2.如权利要求1所述的一种基于神经网络和衰减策略的在线索引选择方法,其特征在于,所述训练树卷积神经网络,以使树卷积神经网络基于查询执行计划,输出对应的执行代价,具体步骤包括:2. The online index selection method based on neural networks and decay strategies as described in claim 1, characterized in that the training of the tree convolutional neural network, so that the tree convolutional neural network outputs the corresponding execution cost based on the query execution plan, specifically includes the following steps: 获取数据库中已执行的查询执行计划及对应的执行代价;Retrieve the execution plans and corresponding execution costs of queries that have already been executed in the database; 将获取的查询执行计划和执行代价组成数据集,并按照预设比例将数据集中的数据分为训练集和测试集;The obtained query execution plans and execution costs are combined into a dataset, and the data in the dataset is divided into training set and test set according to a preset ratio; 使用训练集对树卷积神经网络进行训练,并采用测试集判断树卷积神经网络的训练结果,当树卷积神经网络收敛后,即可得到训练完成的树卷积神经网络。The tree convolutional neural network is trained using a training set and the training result is judged using a test set. Once the tree convolutional neural network converges, the trained tree convolutional neural network can be obtained. 3.如权利要求1所述的一种基于神经网络和衰减策略的在线索引选择方法,其特征在于,所述构建哈希表,具体的构建过程为:3. The online index selection method based on neural networks and decay strategies as described in claim 1, characterized in that the specific construction process of the hash table is as follows: 遍历数据库中每个表的所有列和索引,将不存在索引的列作为可建立索引的列;Iterate through all columns and indexes of each table in the database, and identify columns without existing indexes as columns that can be indexed. 逐个将可建立索引的列和一个空数组作为键值对赋值给哈希表,实现哈希表的构建。The hash table is constructed by assigning each indexable column and an empty array as key-value pairs to the hash table. 4.如权利要求1所述的一种基于神经网络和衰减策略的在线索引选择方法,其特征在于,所述基于树卷积神经网络得到新到查询在默认索引配置下的默认执行代价,具体步骤包括:4. The online index selection method based on neural networks and decay strategies as described in claim 1, characterized in that the step of obtaining the default execution cost of a new query under the default index configuration based on a tree convolutional neural network specifically includes: 调用数据库优化器得到新到查询的查询执行计划;The database optimizer is invoked to obtain the query execution plan for the new query; 将新到查询的查询执行计划输入树卷积神经网络,输出则为新到查询在默认索引配置下的执行代价,即默认执行代价。The execution plan of the newly arrived query is input into the tree convolutional neural network, and the output is the execution cost of the newly arrived query under the default index configuration, i.e., the default execution cost. 5.如权利要求1所述的一种基于神经网络和衰减策略的在线索引选择方法,其特征在于,所述基于新到查询的默认执行代价,计算得到数据库中每个列上的索引对新到查询的收益,具体步骤包括:5. The online index selection method based on neural networks and decay strategies as described in claim 1, characterized in that the step of calculating the benefit of the index on each column of the database for the new query based on the default execution cost of the new query specifically includes: 对于每个在新到查询中出现的列,在当前列上建立虚拟索引,并基于数据库优化器和树卷积神经网络计算得到此时当前列的查询执行代价;For each column that appears in the new query, a virtual index is created on the current column, and the query execution cost of the current column is calculated based on the database optimizer and tree convolutional neural network. 将新到查询的默认执行代价减去此时当前列的查询执行代价,得到当前列上的索引对新到查询的收益。Subtracting the current column's query execution cost from the default execution cost of the new query gives the benefit of the index on the current column for the new query. 6.如权利要求5所述的一种基于神经网络和衰减策略的在线索引选择方法,其特征在于,所述将该收益加入哈希表中对应数组的尾部,具体为:6. The online index selection method based on neural networks and decay strategies as described in claim 5, characterized in that, adding the gain to the tail of the corresponding array in the hash table specifically comprises: 将新到查询的收益加入哈希表中对应数组的尾部,并删除虚拟索引,且对于没有在新到查询中出现的列,将0加入哈希表中对应数组的尾部。Add the earnings from new queries to the end of the corresponding array in the hash table, and delete the virtual index. For columns that do not appear in new queries, add 0 to the end of the corresponding array in the hash table. 7.如权利要求6所述的一种基于神经网络和衰减策略的在线索引选择方法,其特征在于,当将新到查询的收益加入哈希表中对应数组的尾部之后,还包括:7. The online index selection method based on neural networks and decay strategies as described in claim 6, characterized in that, after adding the revenue of a newly arrived query to the tail of the corresponding array in the hash table, it further includes: 判断当前数组的长度是否大于预设长度,若是,则删除当前数组的头元素,若否,则不做处理。Determine if the length of the current array is greater than the preset length. If it is, delete the first element of the current array; otherwise, do nothing. 8.如权利要求1所述的一种基于神经网络和衰减策略的在线索引选择方法,其特征在于,所述按照总计收益对索引排序,由高至低选取预设个数的索引作为新索引配置,具体步骤包括:8. The online index selection method based on neural networks and decay strategies as described in claim 1, characterized in that the step of sorting the indexes according to total revenue and selecting a preset number of indexes from high to low as new index configuration specifically includes: 计算哈希表中每个数组的和,将其作为对应列上索引的总计收益;Calculate the sum of each array in the hash table and use it as the total gain of the index on the corresponding column; 按照总计收益对索引排序,由高至低选取预设个数的索引作为新索引配置,并将数据库中已存在索引配置替换为新索引配置。Sort the indexes according to total revenue, select a preset number of indexes from high to low as new index configurations, and replace the existing index configurations in the database with the new index configurations. 9.一种基于神经网络和衰减策略的在线索引选择装置,其特征在于,包括:9. An online index selection device based on neural networks and a decay strategy, characterized in that it comprises: 训练模块,其用于训练树卷积神经网络,以使树卷积神经网络基于查询执行计划,输出对应的执行代价;The training module is used to train the tree convolutional neural network so that the tree convolutional neural network outputs the corresponding execution cost based on the query execution plan; 构建模块,其用于构建哈希表,且构建的哈希表的键为数据库中每个可建立索引的列,值为数组,所述数组用以保存对应列上的索引对过去查询的收益;The building module is used to build a hash table, where the key of the hash table is each indexable column in the database and the value is an array used to store the benefits of the index on the corresponding column for past queries. 第一计算模块,其用于采用衰减策略,基于衰减参数得到哈希表中每个数组的每个位置上的值的新值;The first calculation module is used to obtain the new value of the value at each position of each array in the hash table based on the decay parameter using a decay strategy. 禁用模块,其用于禁用数据库中已存在的索引,并基于树卷积神经网络得到新到查询在默认索引配置下的默认执行代价;The disable module is used to disable existing indexes in the database and obtain the default execution cost of new queries under the default index configuration based on a tree convolutional neural network; 第二计算模块,其用于基于新到查询的默认执行代价,计算得到数据库中每个列上的索引对新到查询的收益,并将该收益加入哈希表中对应数组的尾部;The second calculation module is used to calculate the benefit of the index on each column of the database for the new query based on the default execution cost of the new query, and add the benefit to the tail of the corresponding array in the hash table; 启用模块,其用于基于数据库中的索引指令启用已存在索引,并按照总计收益对索引排序,由高至低选取预设个数的索引作为新索引配置;The enable module is used to enable existing indexes based on index instructions in the database, and sort the indexes according to total revenue, selecting a preset number of indexes from high to low as new index configurations; 其中,所述采用衰减策略,基于衰减参数得到哈希表中每个数组的每个位置上的值的新值,具体步骤包括:The step of employing a decay strategy to obtain the new value of each position in each array of the hash table based on the decay parameter includes the following specific steps: 建立两层的for循环,第一层for循环用于循环遍历哈希表中的每个数组,第二层for循环用于循环遍历数组中的每个位置;Create two nested for loops. The first for loop iterates through each array in the hash table, and the second for loop iterates through each position in the array. 将哈希表中每个数组的每个位置的值乘以衰减参数,得到的新值作为当前位置的值。Multiply the value at each position of each array in the hash table by the decay parameter, and use the new value as the value at the current position.
CN202310303515.8A 2023-03-23 2023-03-23 An online index selection method and apparatus based on neural networks and decay strategies Active CN116383242B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310303515.8A CN116383242B (en) 2023-03-23 2023-03-23 An online index selection method and apparatus based on neural networks and decay strategies

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310303515.8A CN116383242B (en) 2023-03-23 2023-03-23 An online index selection method and apparatus based on neural networks and decay strategies

Publications (2)

Publication Number Publication Date
CN116383242A CN116383242A (en) 2023-07-04
CN116383242B true CN116383242B (en) 2025-11-25

Family

ID=86976178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310303515.8A Active CN116383242B (en) 2023-03-23 2023-03-23 An online index selection method and apparatus based on neural networks and decay strategies

Country Status (1)

Country Link
CN (1) CN116383242B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807041A (en) * 2019-11-01 2020-02-18 广州华多网络科技有限公司 Index recommendation method and device, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11386086B2 (en) * 2018-08-30 2022-07-12 International Business Machines Corporation Permutation-based machine learning for database query optimization
CN115658681B (en) * 2022-09-20 2025-06-20 西北工业大学 A DQN database index recommendation method based on tree-shaped invalid action masking

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807041A (en) * 2019-11-01 2020-02-18 广州华多网络科技有限公司 Index recommendation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116383242A (en) 2023-07-04

Similar Documents

Publication Publication Date Title
US9652497B2 (en) Processing queries using hybrid access paths
US20050240570A1 (en) Partial query caching
US9892159B2 (en) Distance-based logical exploration in a relational database query optimizer
CN108897842A (en) Computer readable storage medium and computer system
CN112052404A (en) Group discovery method, system, device and medium for multi-source heterogeneous relational network
CN110637292B (en) Systems and methods for querying resource caches
CN120723757B (en) Data warehouse multi-hop relation quick retrieval method and system based on graph embedded index
CN118689879B (en) Target index recommendation method, electronic device and computer-readable storage medium
CN108304585A (en) A kind of result data choosing method and relevant apparatus based on spatial key search
CN114692978A (en) Social media user behavior prediction method and system based on big data
US8548980B2 (en) Accelerating queries based on exact knowledge of specific rows satisfying local conditions
CN118447127A (en) Method, device, equipment, medium and product for generating flow chart based on flow engine
US20190220924A1 (en) Method and device for determining key variable in model
CN120371858B (en) Intelligent NL2SQL generating and optimizing system, application method and storage medium
CN116383242B (en) An online index selection method and apparatus based on neural networks and decay strategies
CN112818010B (en) Database query method and device
CN110968267B (en) Data management method, device, server and system
CN119558961A (en) A risk prediction method, device, electronic device, storage medium and program product
CN114461892B (en) Evaluation method for location retrieval input prompt service, electronic device and program product
CN119620959B (en) Elastic telescoping method, device, medium and program product for storage space
CN116523038B (en) Data monitoring methods, devices, computer equipment and storage media
CN111199002A (en) Information processing method and device
CN100367216C (en) Fast Event Matching Method in Event Model
US20230281099A1 (en) User-impact index for applications serving users
CN120179920A (en) A social network key node mining method and mining system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant