Disclosure of Invention
The embodiment of the application aims to provide a method and a device for executing a structured query language task and electronic equipment, so as to improve the robustness of SQL task execution. The specific technical scheme is as follows:
in a first aspect of an embodiment of the present application, there is provided a method for executing a structured query language task, the method including:
Acquiring a target structured query language statement aiming at a target database;
determining a physical target table involved in the target structured query language statement as a first physical target table;
Determining a target table which does not exist in the target database in each first physical target table as a second physical target table;
For each second physical target table, determining a selection statement associated with the second physical target table in the target structured query language statement, and generating the second physical target table in the target database according to a field related to the selection statement;
and executing the structured query language task indicated by the target structured query language statement based on the physical source data table and the physical target table existing in the target database.
In one possible implementation manner, the determining, as the first physical target table, a physical target table involved in the target structured query language statement includes:
converting the single structured query language statement into SQLNode for each single structured query language statement in the target structured query language statement, each node in SQLNode being used to represent an element in the single structured query language statement;
And traversing each SQLNode respectively, determining each node used for representing the physical target table, and taking the physical target table represented by each determined node as a first physical target table.
In one possible embodiment, before the traversing each SQLNode separately, the method further includes:
determine SQLNode that includes a create view node, wherein the create view node is a node for representing a create view class;
And converting the created view node into a grammar subtree at each determined SQLNode, wherein each node in the grammar subtree is used for representing each insertion class and selection class referencing the created view class represented by the created view node.
In a possible implementation manner, the executing the structured query language task indicated by the target structured query language statement includes:
converting the single structured query language statement into RelNode for each single structured query language statement in the target structured query language statement, each node in RelNode being used to represent a step in a structured query language subtask indicated by the single structured query language statement;
Determining a logic data source table and a logic target table involved in each RelNode as a first logic data source table and a first logic target table;
according to a preset mapping relation between a first field and a second field and a mapping relation between a first field type and a second field type, adjusting the first field existing in the logic data source table and the logic target table to be corresponding second fields, and adjusting the first field type existing in the logic data source table and the logic target table to be corresponding second field types to obtain a second logic data source table and a second logic target table, wherein the first field and the first field type are fields and field types which are not supported by a computing engine, and the second field type are fields and field types which are supported by the computing engine;
And executing the structured query language task indicated by the target structured query language statement according to the second logic data source table and the second logic target table by the computing engine.
In one possible embodiment, the method further comprises:
for each first logic data source table and each first logic target table, acquiring metadata information related to the table in a target database as first metadata information;
According to a preset mapping rule between a first parameter and a second parameter, the first parameter existing in the first metadata information is adjusted to be a corresponding second parameter, so that second metadata information is obtained, wherein the first parameter is a parameter which is not supported by the computing engine, and the second parameter is a parameter which is supported by the computing engine;
The executing, by the computing engine, the structured query language task indicated by the target structured query language statement according to the second logical data source table and the second logical target table, including:
And executing the structured query language task indicated by the target structured query language statement according to the second logic data source table, the second logic target table and the second metadata information by the computing engine.
In a second aspect of the embodiment of the present application, there is provided a structured query language task execution apparatus, including:
The acquisition module is used for acquiring target structured query language sentences aiming at the target database;
the first determining module is used for determining a physical target table involved in the target structured query language statement as a first physical target table;
The second determining module is used for determining a target table which does not exist in the target database in each first physical target table as a second physical target table;
the generation module is used for determining a selection statement related to the second physical target table in the target structured query language statement for each second physical target table, and generating the second physical target table in the target database according to a field related to the selection statement;
And the execution module is used for executing the structured query language task indicated by the target structured query language statement based on the physical source data table and the physical target table existing in the target database.
In a possible implementation manner, the first determining module is specifically configured to:
converting the single structured query language statement into SQLNode for each single structured query language statement in the target structured query language statement, each node in SQLNode being used to represent an element in the single structured query language statement;
And traversing each SQLNode respectively, determining each node used for representing the physical target table, and taking the physical target table represented by each determined node as a first physical target table.
In one possible embodiment, the apparatus further comprises:
A third determining module, configured to determine SQLNode that includes a create view node before traversing each SQLNode, where the create view node is a node used to represent a create view class;
and the node conversion module is used for respectively converting the created view nodes into grammar subtrees at the determined SQLNode, wherein each node in the grammar subtrees is used for representing each insertion class and each selection class of the created view class represented by the created view node.
In a third aspect of an embodiment of the present application, there is provided an electronic device, including:
A memory for storing a computer program;
And the processor is used for realizing the method for executing the structured query language task according to any one of the first aspect when executing the program stored in the memory.
In a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements the structured query language task execution method of any of the preceding aspects.
The embodiment of the application also provides a computer program product containing instructions, which when run on a computer, cause the computer to execute the structured query language task execution method described in any one of the above.
The embodiment of the application has the beneficial effects that:
the embodiment of the application provides a structured query language task execution method, which comprises the steps of obtaining target structured query language sentences aiming at a target database, determining target tables which are not in the target database in each first physical target table as first physical target tables according to physical target tables related to the target structured query language sentences, determining selection sentences related to the second physical target tables in the target structured query language sentences according to each second physical target table, generating the second physical target tables in the target database according to fields related to the selection sentences, and finally executing the structured query language task indicated by the target structured query language sentences based on the physical source data tables and the physical target tables in the target database. According to the method provided by the embodiment of the application, the new physical target table can be determined according to the target structured query language statement, so that when a user issues the SQL task, even if the physical target table changes, the new physical target table obtained based on the target structured query language statement can be used, the calculation engine can accurately execute the SQL task according to the physical source data table and the new physical target table existing in the target database, the new physical target table does not need to be created in advance, the calculation engine can accurately execute the SQL task, and the execution robustness of the SQL task is improved.
Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. Based on the embodiments of the present application, all other embodiments obtained by the person skilled in the art based on the present application are included in the scope of protection of the present application.
For a clearer description of the data query method provided by the present application, the following will explain the related terms referred to herein:
SQL, the full scale Structured Query Language, i.e., the structured query language, is a standard programming language that is specifically used to manage and manipulate databases, allowing users to perform queries, updates, and management of data.
SQL execution plan the database management system, upon receiving the SQL query statement, generates a detailed execution step, called SQL execution plan, based on the content queried by the SQL query statement. The SQL execution plan is the core for guiding the database management system to perform operations such as data query, update and management, and determines the efficiency and performance of the query.
A computing engine, a component or framework for processing and executing computing tasks, provides real-time or near real-time data computing capabilities, and often has some metadata management capabilities itself.
Metadata, which is data describing attribute information of the data, is used for supporting functions of indicating storage positions, historical data, resource searching, file recording and the like.
The physical target table is also called a physical sink (output component in data flow processing) table, and refers to a specific data source table for outputting data from a calculation engine in a flow processing flow.
Physical source table, also known as physical source table, refers to a specific data source table of data input calculation engine in stream processing.
The stream processing task SQL is used for submitting the SQL of the task, and is generally composed of three parts, namely a source table, a sink table and a business logic SQL, wherein the sink table is a memory table mapped by a physical sink table of a real data source in a computing engine, the source table is a memory table mapped by a physical source table of the real data source in the computing engine, and the sink table and the source table are composed of table names, fields, field types, data source connection parameters and computing engine high-order parameters.
SQLNode also called SQL nodes are abstract syntax tree nodes of SQL sentences, and represent various constituent parts of the SQL sentences, such as select, from, where, join, etc. SQLNode is the product of the SQL parsing stage, which parses SQL statements into a tree structure for subsequent processing.
RelNode the abstract Node of the Relational expression, which represents the logic execution plan after query optimization, is called the Relational Node, and metadata information of the corresponding source table and the target table is introduced into RelNode.
Business SQL refers to SQL related to data stream processing logic in stream computation, and related to the task itself are CREATE VIEW (create view), select, and insert statements.
In the related art, when a query analysis task is executed, in order to simplify the flow of the query analysis, a physical source table and a physical target table are mapped in a computing engine to obtain a source table and a target table, and the source table and the target table are persisted in the computing engine, as shown in fig. 1, which is a comparison diagram of the target table/source table and the physical target table/physical source table, if the query is required to be performed based on data in the physical source table 1 and the physical source table 2 when the query analysis task is executed, to obtain a physical target table containing a query analysis result, then the source table 1 and the source table 2 corresponding to the physical source table 1 and the physical source table 2 respectively need to be created in the computing engine 100 according to the data in the physical source table, the target table corresponding to the physical target table is created in the computing engine 100, and the source table 1, the source table 2 and the target table are persisted in the computing engine 100, and when the query is executed, the computing engine 100 creates a view based on the source table 1 and the source table 2, and the query is required to be converted and synchronized to the physical target table.
However, if the user query analysis requirement changes, that is, the physical source table and the physical target table change, the source table and the target table are permanently stored in the computing engine, and the source table, the target table, and the new physical source table and the new physical target table stored in the computing engine are not identical, the computing engine cannot create a view according to the new physical source table and the new physical target table and perform query and analysis, so that the SQL task cannot be executed.
In order to improve the robustness of the execution of the SQL task, in a first aspect of the embodiment of the present application, a method for executing a structured query language task is provided, referring to fig. 2, fig. 2 is a first flowchart of the method for executing a structured query language task provided in the embodiment of the present application, where the method includes the following steps:
step S10, acquiring a target SQL statement aiming at a target database;
step S20, determining a physical target table involved in a target SQL statement as a first physical target table;
step S30, determining a target table which does not exist in the target database in each first physical target table as a second physical target table;
Step S40, for each second physical target table, determining a selection statement associated with the second physical target table in the target SQL statement, and generating the second physical target table in the target database according to the field related to the selection statement;
And step S50, executing the SQL task indicated by the target structured query language statement based on the physical source data table and the physical target table existing in the target database.
According to the method provided by the embodiment of the application, the new physical target table can be determined according to the target structured query language statement, so that when a user issues the SQL task, even if the physical target table changes, the new physical target table obtained based on the target structured query language statement can be used, the calculation engine can accurately execute the SQL task according to the physical source data table and the new physical target table existing in the target database, the new physical target table does not need to be created in advance, the calculation engine can accurately execute the SQL task, and the execution robustness of the SQL task is improved.
The above step S10 to step S50 will be described below:
In step S10, the target SQL statement is a statement constructed based on a task that the user wants to perform for a certain database, and according to an operation that the user wants to perform, the SQL statement may include DDL (Data Definition Language ) statements such as a statement for Creating (CREATE), modifying (ALTER) and Deleting (DROP) objects of the database, tables, views, indexes, etc., DML (Data Manipulation Language ) statements such as a statement for performing operations of Inserting (INSERT), updating (UPDATE) and Deleting (DELETE) data in the database, DQL (Data Query Language ) statements whose core statement is a SELECT (SELECT) statement for retrieving data from the database, DCL (Data Control Language ) statements for controlling access rights of the database user.
Databases include, but are not limited to, one or more of MySQL (Relational Database MANAGEMENT SYSTEM ), kafka (an open source stream processing platform), oracle (an information management software), elastosearch (search server), postgreSQL (an object-relational database management system), starRocks (a database system based on massively parallel processing technology).
It will be appreciated that there are multiple tables containing raw data in the target database, i.e., physical source tables, also known as physical source data tables, and that SQL statements are actually statements built based on tasks that the user wants to perform with respect to the source data tables present in the target database. Illustratively, assume a database storing a physical source table named "sales_orders" storing detailed information of all sales orders, such as order numbers, product information, sales dates, etc., through SQL statement "SELECTFROM samples_descriptors "can extract all the data in the physical source table.
In step S20, the physical target table related to the SQL statement may be stored in the database in advance, or may be obtained when the SQL task indicated by the SQL statement is executed.
However, if the physical target table is obtained when the SQL task indicated by the SQL statement is executed, the physical target table may not be present in the target database, and the physical target table needs to be created in the target database.
After determining the target table that does not exist in the target database as the second physical target table in step S30, an actual second physical target table is created in the target database for each second physical target table in step S40.
In step S40, the selection statement associated with the second physical object table refers to a selection statement concerning the content of the second physical object table among the statements. Since the basic grammar of the selection statement is "SELECT [ column_list ] FROM [ table_name ]", where [ column_list ] is a list of columns to be selected, one or more column names may be used, and wild cards may also be usedTo indicate that all columns are selected, [ table_name ] is the name of the physical source table from which the data is to be retrieved.
The field referred to by the SELECT statement is "[ column_list ]" in the SELECT statement, and since the SELECT statement is a list of columns to be selected from the physical source table as a part of the contents of the second physical destination table, if the second physical destination table does not exist in the destination database, the structure of the second physical destination table can be determined based on the contents retrieved by the SELECT statement from the physical source table, that is, the second physical destination table is generated in the destination database.
For example, if the selection statement associated with the second physical object table is "select_id FROM orders1", "select_ amount FROM orders2", and "order_id" indicates an order number and "order_count" indicates an order amount, the second physical object table includes the order number and the order amount.
In step S50, the SQL task, i.e., a task that the user wants to execute with respect to the source data table existing in the target database, includes a data query task (SQL statement is a SELECT statement), a data insertion task (SQL statement is an INSERT statement), a data UPDATE task (SQL statement is an UPDATE statement), a data deletion task (SQL statement is a DELETE statement), and the like.
There may be only one or more physical target tables involved in the target SQL statement. In the case where the target SQL statement contains multiple SQL statements, there may be more first physical target tables involved in the multiple SQL statements, and determining the physical target tables involved in the SQL statements is time consuming.
Based on this, in a possible implementation manner, as shown in fig. 3, a second flowchart of a structured query language task execution method provided by an embodiment of the present application, the step S20 specifically includes:
Step S201, converting the single SQL statement into SQLNode, SQLNode aiming at each single SQL statement in the target SQL statement, wherein each node in SQLNode, SQLNode is used for representing one element in the single SQL statement;
step S203, each SQLNode is traversed, each node representing the physical object table is determined, and the physical object table represented by each determined node is used as the first physical object table.
The target SQL statement may include only a single SQL statement or may include a plurality of SQL statements. In the case where the target SQL statement is a compound statement composed of a plurality of SQL statements, it is necessary to disassemble the target SQL statement into a plurality of single SQL statements.
Where a single SQL statement refers to a statement that contains only one complete and independent SQL command that is executed at a time, the single SQL statement includes, but is not limited to, a single SELECT statement, a single INSERT statement, a single UPDATE statement, a single DELETE statement, a single CREATE statement. Illustratively, the SQL statement is DELETE FROM students WHERE age < 18, meaning that all student records with ages less than 18 in the "students" table are deleted, which is a single SQL statement and a single DELETE statement, which is used only for data deletion tasks.
Each node SQLNode is used to represent an element in a single SQL statement, and, taking the foregoing statement "DELETE FROM students WHERE age < 18" as an example, "DELETE", "FROM", "WHERE" respectively represent an element of the SQL statement.
By adopting the method of the embodiment of the application, the SQL sentence is converted into SQLNode, and as each node of SQLNode is used for representing one element in a single SQL sentence, the physical target table related to the SQL sentence can be determined and obtained only by traversing the node representing the physical target table in SQLNode, and sentence-by-sentence analysis is not required for each target SQL sentence.
CREATVIEW (create view) statements are used to create a view in the database that is a virtual table based on the results of the SQL query. Since the view does not store the actual data, but only generates data dynamically according to the SQL query, the grammar of CREATVIEW statements must be correct and correct to ensure that the view can be properly created and the query results returned.
In the related art, CREATVIEW sentences are executed in a computing engine, and after a memory table is built, the DML sentences referencing the view can be checked.
Since the basic syntax for creating a view sentence is CREATE VIEW VIEW _ NAME AS SELECT column1, column 2. WHERE view_name is the name of the view that the user wants to create, SELECT column1, column 2. FROM table_name formulates that the view is based on that table query, WHERE condition is an optional filtering operation.
The field behind the field 'CREATE VIEW VIEW _name AS' in the created view sentence is actually a SELECT sentence, and checking the sentence behind the field 'CREATE VIEW VIEW _name AS' in the view sentence is equivalent to checking the created view sentence when checking whether the created view sentence is error-free. Because the view is essentially a virtual table based on SQL query results, SELECT statements are typically included in creating the view statements.
While checking the SELECT statement or INSERT statement does not rely on the calculation engine, in one possible implementation, the create view statement may be rewritten as a SELECT statement prior to performing step S203 described above.
Specifically, as shown in fig. 4, a third flowchart of a method for executing a structured query language task according to an embodiment of the present application is shown, where the method further includes:
Step S2021, determining SQLNode including creating a view node;
wherein the view creation node is a node for representing creation of view class;
Step S2022, converting the created view nodes into grammar subtrees at each SQLNode determined;
Wherein each node in the grammar sub-tree is used for representing each insertion class and selection class of the creation view class represented by the reference creation view node, namely an INSERT class and a SELECT class.
By adopting the method provided by the embodiment of the application, in SQLNode containing the view creating node, the view creating node is converted into the grammar subtree for representing the SELECT statement, so that the capability of analyzing and checking the view creating statement without depending on a calculation engine is realized, and the flexibility and efficiency of SQL query are improved.
It will be appreciated that the data types in the physical source tables and physical destination tables stored in the destination database are different from the data types that the compute engine can identify, and therefore, the physical source tables and physical destination tables need to be converted into the data types that the compute engine can identify before the compute engine performs the SQL task.
Referring to fig. 5, which is a fourth flowchart of a method for executing a structured query language task according to an embodiment of the present application, the step S50 of executing the structured query language task indicated by the target structured query language sentence specifically includes:
Step S501, converting the single SQL statement into RelNode, relNode aiming at each single SQL statement in the target SQL statement, wherein each node in RelNode, relNode is used for representing one step in the SQL subtask indicated by the single SQL statement;
A single SQL statement may be converted to RelNode, e.g., calcite (a data query engine may convert any data query to an SQL-based query) using existing tools. Specifically, a single SQL statement may first be converted SQLNode using the Calcite SQL parser, and since the structure of the physical object table has been determined based on the above and the structure of the physical source table is known, calcite may access the metadata of the physical source table and physical object table for validation, and after validation pass, use SqlToRelConverter (a key component of Calcite) of Calcite to convert SQLNode to logic RelNode.
Step S503, determining a logic data source table and a logic target table related in each RelNode as a first logic data source table and a first logic target table;
Step S505, according to the preset mapping relation between the first field and the second field and the mapping relation between the first field type and the second field type, the first field existing in the logic data source table and the logic target table is adjusted to be the corresponding second field, and the existing first field type is adjusted to be the corresponding second field type, so that a second logic data source table and a second logic target table are obtained;
the first field and the first field type are fields and field types which are not supported by the computing engine, and the second field type are fields and field types which are supported by the preset computing engine;
and S507, executing the SQL task indicated by the target SQL statement according to the second logic data source table and the second logic target table by the computing engine.
By adopting the method of the embodiment of the application, the field types in the logic data source table and the logic target table related in RelNode are adjusted so that the field types in the logic data source table and the logic target table are field types which can be identified by the computing engine, and the problem that SQL tasks cannot be normally executed because the field types in the logic data source table and the logic target table related in RelNode are types which are not identified by the computing engine is avoided.
It can be understood that the mapping relation between the field types supported by the computing engine and the field types that cannot be identified by the computing engine is pre-stored in the target database, and after the logical data source table and the logical target table are obtained, parameters in metadata information related to the table can be adjusted to parameters supported by the computing engine according to the mapping relation pre-stored in the target database.
Referring to fig. 6, fig. 6 is a fifth flowchart of a method for executing a task in a structured query language according to an embodiment of the present application, where the method further includes the following steps:
step S5041, aiming at each first logic data source table and each first logic target table, acquiring metadata information related to the table in a target database as first metadata information;
step S5042, according to a mapping rule between a preset first parameter and a preset second parameter, the first parameter existing in the first metadata information is adjusted to a corresponding second parameter, and second metadata information is obtained;
the first parameter is a parameter which is not supported by the computing engine, and the second parameter is a parameter which is supported by the computing engine;
the step S507 specifically includes:
In step S5071, the structured query language task indicated by the target structured query language statement is executed by the computing engine according to the second logical data source table, the second logical target table, and the second metadata information.
By adopting the method of the embodiment of the application, the logical data source table and the logical target table can be converted into the data types which can be identified by the computing engine because the mapping rule between the parameters supported by the computing engine and the parameters not supported by the computing engine is preset, and the SQL task can be accurately executed when the physical source table and the physical target table are changed.
Based on the foregoing, the method for executing the structured query language task provided by the embodiment of the present application may be considered to be completed in five modules respectively. Fig. 7 is a schematic structural diagram of a module for implementing a task execution method of a structured query language according to an embodiment of the present application, where the module includes an SQL unified parsing sub-module 11, an SQL verification sub-module 12, a task SQL generation sub-module 13, a task submission sub-module 14, and a data collection and storage sub-module 15. The SQL unified analysis sub-module 11, the SQL verification sub-module 12 and the task SQL generation sub-module 13 form a metadata management unit 1 which is arranged outside the computing engine, the metadata management unit 1 can provide the capability of performing SQL analysis and verification independently of the computing engine, and can manage the metadata information of other data sources, so that a user does not need to pay attention to the problems of data mapping between the data sources and the computing engine, data parameter mapping of the data sources, whether a physical target table exists or not, and the like.
As can be seen from fig. 7, in the process of executing the SQL task, the SQL unified analysis sub-module 11, the SQL verification sub-module 12, the task SQL generation sub-module 13, and the task submission sub-module 14 continuously acquire data from the data acquisition and storage sub-module 15. In the embodiment of the application, in the process of executing SQL tasks by the SQL unified analysis sub-module 11, the SQL verification sub-module 12, the task SQL generation sub-module 13 and the task submission sub-module 14, firstly, the SQL unified analysis sub-module 11 executes SQL segmentation and SQL analysis to obtain SQLNode, then the SQL verification sub-module 12 executes sink table verification and sink table generation based on SQLNode to obtain RelNode, then the task SQL generation sub-module 13 executes field type mapping, metadata information mapping and high-order parameter mapping, converts a logic data source table and a logic target table related to RelNode into data supported by a computing engine, and finally, the task submission sub-module 14 executes task SQL verification, task parameter filling and SQL task and task parameter submission to the computing engine.
The following describes the SQL unified parsing sub-module 11, the SQL verification sub-module 12, the task SQL generation sub-module 13, the task submission sub-module 14, and the data acquisition and storage sub-module 15, respectively:
(1) Data acquisition and storage sub-module 15
The sub-module corresponds to the target database in the foregoing, the task generates the core data support of the sub-module, and metadata information of the data source (i.e. heterogeneous data source metadata in fig. 7), the mapping of field types of data source table information and computing engine data types, the mapping of table names, the mapping of connection parameters and other key rules (i.e. the mapping rule of the data source and the computing engine in fig. 7) and the high-level parameters preset by the user through the front-end interface (i.e. the high-level parameter information in fig. 7) are stored in the sub-module.
The information stored in the data acquisition and storage sub-module 15 is continuously stored and utilized during execution of the SQL task.
(2) SQL unified parsing sub-module 11
The SQL unified parsing sub-module 11 functions to parse one or more target SQL statements of input into an abstract syntax tree SQLNode. For example, CREATEVIEW statements would be converted to SQLCREATEVIEW classes, INSERT statements would be converted to SQLINSERT classes, and SELECT statements would be converted to SQLSELECT classes. The submodule has an SQL statement rewriting function, and can reconstruct CREATEVIEW statements, INSERT statements and SELECT statements.
The workflow of the SQL unified parsing sub-module 11 is shown in FIG. 8. First, a target SQL statement composed of a plurality of SQL statements submitted by a user (corresponding to the step S10) is acquired and finely divided into a plurality of single SQL statements, and each single SQL statement is parsed into SQLNode item by means of CALCITE PARSER (corresponding to the step S201), so as to obtain SQLNodelist corresponding to the target SQL statement. Then traversing whether the SQLNodelist corresponding to the target SQL statement contains SQLNode containing CREATVIEW statement, if not, directly temporarily storing SQLNodelist corresponding to the target SQL statement in the memory of the service program running the metadata management unit, clearing the service program after the SQL task is finished, if yes, further checking whether other SQLSELECT statement or SQLINSERT statement refers to the VIEW, if yes, reconstructing CREATVIEW statement into sub-query statement (SQLSELECT statement or SQLINSERT statement), merging with corresponding SQLSELECT statement or SQLINSERT statement (corresponding to step S2021-step S2022), finally obtaining SQLSELECT statement set or SQLINSERT statement set, if not, directly temporarily storing SQLNodelist corresponding to the target SQL statement in the memory of the service program running the metadata management unit, clearing the service program after the SQL task is finished.
(3) Task SQL verification sub-module 13
See FIG. 9 for a workflow diagram of a task SQL check sub-module. After receiving SQLSELECT statement sets or SQLINSERT statement sets output by the SQL unified analysis sub-module 11, the sub-module uses SqlVisitor visitors to traverse all physical target tables in SQLNodelist (corresponding to the step S20 described above), combines the metadata information of the data sources of the data acquisition and storage sub-module 15 to determine whether the physical target tables actually have data sources (corresponding to the step S30 described above), if so, calls an interface to acquire the metadata information of all physical target tables and physical source tables from the data acquisition and storage sub-module 15, performs SQL verification, and then converts SQLNode into RelNode (corresponding to the step S501 described above).
If not, then the fields in SQLINSERT statements for the physical target table need to be removed, the SQLINSERT statement converted to SQLSELECT statement, and the SQLSELECT statement converted to RelNode in combination with the metadata information of the physical source table. Finally, generating a table construction statement of the target table by utilizing the metadata information of the data source in the data acquisition and storage sub-module 15, and after the user determines to execute the SQL, the physical target table in the data acquisition and storage sub-module 15 is already present, and calling an interface to acquire the metadata information of all the physical target tables and the physical source tables from the data acquisition and storage sub-module 15, and converting SQLNode into RelNode.
And finally, temporarily storing all RelNode sets in a memory, and cleaning the SQL task after the SQL task is generated subsequently.
(4) Task SQL generating sub-module 13
See FIG. 10 for a workflow diagram of the task SQL generating sub-module. The submodule acquires the logical data source table and the logical target table (corresponding to the step S503) related in RelNode according to the RelNode set output by the task SQL generating submodule 13, and then adjusts the fields and the field types existing in the logical data source table and the logical target table to the fields and the field types which can be identified by the calculation engine (corresponding to the step S505) so as to ensure that the data source data can be correctly identified by the calculation engine. Then, metadata information related to the table in the data collection and storage sub-module 15, such as parameters in an SQL statement (corresponding to the foregoing step S5041), is obtained, and based on a preset mapping rule, the metadata information of the data source is mapped (i.e. the data source data information mapping in fig. 10) and converted into a writing method specified by the computing engine (corresponding to the foregoing step S5042), so as to generate parameters required by the computing engine to connect to the corresponding data source, for example, a with parameter in the link, a parameter connected to the mysql database, url (uniform resource locator ), driver (driver), and the like. And finally, converting the high-order parameters supplemented by the user obtained by the front end into an engine specified writing method according to a preset mapping rule, supplementing the engine specified writing method into the task SQL, generating the high-order parameters of the task SQL, and further obtaining the final complete task SQL.
Wherein the high-order parameters are tuning parameters located between the compute engine and the data source that do not affect the connection of the compute engine to the data source. The high-order parameters are usually default values, but some users can finely adjust the parameters according to different task requirements. The high-order parameters are divided into source table high-order parameters and target table high-order parameters of the data source. The source table high-order parameters of the data source include a scan partition number parameter (scan. Partition. Num), a scan partition column parameter (scan. Partition. Column), a cache policy parameter (cache), and the target table high-order parameters include an insertion parallelism (sink. Parallel), a cache refresh interval time (sink. Buffer-flush. Interval), a cache maximum storage line number (sink. Buffer-flush. Max-rows), and the like. It will be appreciated that the higher order parameters of different data sources may be different.
(5) Task submission submodule 14
After the task SQL is generated, the task SQL is returned to the front end for verification by the user, and the computing engine also performs verification on the task SQL through the verification interface. After the verification is completed, specific task parameters are set for users to perform task optimization in different degrees before each task SQL is submitted, and after the users perform task parameter configuration through a front-end interface, the sub-module submits the tasks to a computing engine, and the computing engine executes the tasks.
Corresponding to the foregoing first aspect, a second aspect of the embodiment of the present application provides a structured query language task execution device, and as shown in fig. 11, the structured query language task execution device provided by the embodiment of the present application includes:
an acquisition module 1101, configured to acquire a target structured query language sentence for a target database;
A first determining module 1102, configured to determine a physical target table involved in the target structured query language statement as a first physical target table;
a second determining module 1103, configured to determine, in each of the first physical target tables, a target table that does not exist in the target database, as a second physical target table;
A generating module 1104, configured to determine, for each of the second physical target tables, a selection statement associated with the second physical target table in the target structured query language statement, and generate the second physical target table in the target database according to a field related to the selection statement;
An execution module 1105, configured to execute the structured query language task indicated by the target structured query language statement based on the physical source data table and the physical target table existing in the target database.
According to the device provided by the embodiment of the application, the new physical target table can be determined according to the target structured query language statement, so that when a user issues the SQL task, even if the physical target table changes, the new physical target table obtained based on the target structured query language statement can be used, the calculation engine can accurately execute the SQL task according to the physical source data table and the new physical target table existing in the target database, the new physical target table does not need to be created in advance, the calculation engine can accurately execute the SQL task, and the execution robustness of the SQL task is improved.
In a possible implementation manner, the first determining module 1102 is specifically configured to:
converting the single structured query language statement into SQLNode for each single structured query language statement in the target structured query language statement, each node in SQLNode being used to represent an element in the single structured query language statement;
And traversing each SQLNode respectively, determining each node used for representing the physical target table, and taking the physical target table represented by each determined node as a first physical target table.
In one possible embodiment, the apparatus further comprises:
a third determining module, configured to determine SQLNode that includes a created view node before traversing each SQLNode separately, where the created view node is a node used to represent a created view class;
and the node conversion module is used for respectively converting the created view nodes into grammar subtrees at the determined SQLNode, wherein each node in the grammar subtrees is used for representing each insertion class and each selection class of the created view class represented by the created view node.
The embodiment of the application also provides an electronic device, as shown in fig. 12, including:
A memory 1201 for storing a computer program;
the processor 1202, when executing the program stored in the memory 1201, performs the following steps:
acquiring a target SQL statement aiming at a target database;
determining a physical target table involved in a target SQL statement as a first physical target table;
determining a target table which does not exist in the target database in each first physical target table as a second physical target table;
for each second physical target table, determining a selection statement associated with the second physical target table in the target SQL statement, and generating the second physical target table in the target database according to a field related to the selection statement;
Based on the physical source data table and the physical target table existing in the target database, executing the SQL task indicated by the target structured query language statement.
The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The Processor may be a general-purpose Processor including a central processing unit (Central Processing Unit, CPU), a network Processor (Network Processor, NP), etc., or may be a digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.
In yet another embodiment of the present application, a computer readable storage medium is provided, in which a computer program is stored, which when executed by a processor, implements the steps of any of the structured query language task execution methods described above.
In yet another embodiment of the present application, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the structured query language task execution method of any of the above embodiments.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a Solid state disk (Solid STATE DISK, SSD), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.