[go: up one dir, main page]

CN119357243A - A structured query language task execution method, device and electronic device - Google Patents

A structured query language task execution method, device and electronic device Download PDF

Info

Publication number
CN119357243A
CN119357243A CN202411945264.4A CN202411945264A CN119357243A CN 119357243 A CN119357243 A CN 119357243A CN 202411945264 A CN202411945264 A CN 202411945264A CN 119357243 A CN119357243 A CN 119357243A
Authority
CN
China
Prior art keywords
target
query language
structured query
physical
statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411945264.4A
Other languages
Chinese (zh)
Inventor
杨胜雄
曾锐
胡哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202411945264.4A priority Critical patent/CN119357243A/en
Publication of CN119357243A publication Critical patent/CN119357243A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例提供了一种结构化查询语言任务执行方法、装置及电子设备,涉及数据处理技术领域。方法包括:获取针对目标数据库的目标SQL语句;确定目标SQL语句中涉及的物理目标表,作为第一物理目标表;在各第一物理目标表中,确定不存在于目标数据库中的目标表,作为第二物理目标表;针对每个第二物理目标表,在目标SQL语句中确定第二物理目标表关联的选择语句,并根据选择语句涉及的字段,在目标数据库中生成第二物理目标表;基于目标数据库中存在的物理源数据表和物理目标表,执行目标SQL语句所指示的SQL任务。通过本申请实施例,在物理目标表发生变化时,计算引擎也能够准确的执行任务SQL。

The embodiment of the present application provides a structured query language task execution method, device and electronic device, which relates to the field of data processing technology. The method includes: obtaining a target SQL statement for a target database; determining the physical target table involved in the target SQL statement as the first physical target table; in each first physical target table, determining the target table that does not exist in the target database as the second physical target table; for each second physical target table, determining the selection statement associated with the second physical target table in the target SQL statement, and generating the second physical target table in the target database according to the fields involved in the selection statement; based on the physical source data table and the physical target table existing in the target database, executing the SQL task indicated by the target SQL statement. Through the embodiment of the present application, when the physical target table changes, the computing engine can also accurately execute the task SQL.

Description

Structured query language task execution method and device and electronic equipment
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for executing a structured query language task, and an electronic device.
Background
As the amount of data has exploded, so has the need for efficient and accurate data querying and analysis. In the field of big data processing and analysis, when executing task SQL (Structured Query Language ), because the formats of the source table and the target table supported by the computing engine and the computer are different, it is generally required to convert the data in the physical source table into the format supported by the computing engine, convert the format supported by the computing engine into the format supported by the computer (hereinafter, this process is referred to as a canonical format), and then the computing engine can perform operations such as querying from the physical source table, so as to obtain the target table of the format supported by the computer. However, the SQL needs to specify the formats of the source table and the target table each time the task is executed, and the process is complex. In order to simplify the flow of executing the SQL task, in the related art, a source table corresponding to the physical source table and a target table corresponding to the physical target table are permanently stored in the computing engine, so that a canonical format is not required to be executed when the SQL task is executed each time.
However, if the physical source table and the physical target table are changed, the source table and the target table are permanently stored in the computing engine, and the source table, the target table and the new physical source table and the new physical target table stored in the computing engine are not identical, so that the computing engine cannot execute the SQL task according to the new physical source table and the new physical target table, and the SQL task cannot be executed.
Disclosure of Invention
The embodiment of the application aims to provide a method and a device for executing a structured query language task and electronic equipment, so as to improve the robustness of SQL task execution. The specific technical scheme is as follows:
in a first aspect of an embodiment of the present application, there is provided a method for executing a structured query language task, the method including:
Acquiring a target structured query language statement aiming at a target database;
determining a physical target table involved in the target structured query language statement as a first physical target table;
Determining a target table which does not exist in the target database in each first physical target table as a second physical target table;
For each second physical target table, determining a selection statement associated with the second physical target table in the target structured query language statement, and generating the second physical target table in the target database according to a field related to the selection statement;
and executing the structured query language task indicated by the target structured query language statement based on the physical source data table and the physical target table existing in the target database.
In one possible implementation manner, the determining, as the first physical target table, a physical target table involved in the target structured query language statement includes:
converting the single structured query language statement into SQLNode for each single structured query language statement in the target structured query language statement, each node in SQLNode being used to represent an element in the single structured query language statement;
And traversing each SQLNode respectively, determining each node used for representing the physical target table, and taking the physical target table represented by each determined node as a first physical target table.
In one possible embodiment, before the traversing each SQLNode separately, the method further includes:
determine SQLNode that includes a create view node, wherein the create view node is a node for representing a create view class;
And converting the created view node into a grammar subtree at each determined SQLNode, wherein each node in the grammar subtree is used for representing each insertion class and selection class referencing the created view class represented by the created view node.
In a possible implementation manner, the executing the structured query language task indicated by the target structured query language statement includes:
converting the single structured query language statement into RelNode for each single structured query language statement in the target structured query language statement, each node in RelNode being used to represent a step in a structured query language subtask indicated by the single structured query language statement;
Determining a logic data source table and a logic target table involved in each RelNode as a first logic data source table and a first logic target table;
according to a preset mapping relation between a first field and a second field and a mapping relation between a first field type and a second field type, adjusting the first field existing in the logic data source table and the logic target table to be corresponding second fields, and adjusting the first field type existing in the logic data source table and the logic target table to be corresponding second field types to obtain a second logic data source table and a second logic target table, wherein the first field and the first field type are fields and field types which are not supported by a computing engine, and the second field type are fields and field types which are supported by the computing engine;
And executing the structured query language task indicated by the target structured query language statement according to the second logic data source table and the second logic target table by the computing engine.
In one possible embodiment, the method further comprises:
for each first logic data source table and each first logic target table, acquiring metadata information related to the table in a target database as first metadata information;
According to a preset mapping rule between a first parameter and a second parameter, the first parameter existing in the first metadata information is adjusted to be a corresponding second parameter, so that second metadata information is obtained, wherein the first parameter is a parameter which is not supported by the computing engine, and the second parameter is a parameter which is supported by the computing engine;
The executing, by the computing engine, the structured query language task indicated by the target structured query language statement according to the second logical data source table and the second logical target table, including:
And executing the structured query language task indicated by the target structured query language statement according to the second logic data source table, the second logic target table and the second metadata information by the computing engine.
In a second aspect of the embodiment of the present application, there is provided a structured query language task execution apparatus, including:
The acquisition module is used for acquiring target structured query language sentences aiming at the target database;
the first determining module is used for determining a physical target table involved in the target structured query language statement as a first physical target table;
The second determining module is used for determining a target table which does not exist in the target database in each first physical target table as a second physical target table;
the generation module is used for determining a selection statement related to the second physical target table in the target structured query language statement for each second physical target table, and generating the second physical target table in the target database according to a field related to the selection statement;
And the execution module is used for executing the structured query language task indicated by the target structured query language statement based on the physical source data table and the physical target table existing in the target database.
In a possible implementation manner, the first determining module is specifically configured to:
converting the single structured query language statement into SQLNode for each single structured query language statement in the target structured query language statement, each node in SQLNode being used to represent an element in the single structured query language statement;
And traversing each SQLNode respectively, determining each node used for representing the physical target table, and taking the physical target table represented by each determined node as a first physical target table.
In one possible embodiment, the apparatus further comprises:
A third determining module, configured to determine SQLNode that includes a create view node before traversing each SQLNode, where the create view node is a node used to represent a create view class;
and the node conversion module is used for respectively converting the created view nodes into grammar subtrees at the determined SQLNode, wherein each node in the grammar subtrees is used for representing each insertion class and each selection class of the created view class represented by the created view node.
In a third aspect of an embodiment of the present application, there is provided an electronic device, including:
A memory for storing a computer program;
And the processor is used for realizing the method for executing the structured query language task according to any one of the first aspect when executing the program stored in the memory.
In a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements the structured query language task execution method of any of the preceding aspects.
The embodiment of the application also provides a computer program product containing instructions, which when run on a computer, cause the computer to execute the structured query language task execution method described in any one of the above.
The embodiment of the application has the beneficial effects that:
the embodiment of the application provides a structured query language task execution method, which comprises the steps of obtaining target structured query language sentences aiming at a target database, determining target tables which are not in the target database in each first physical target table as first physical target tables according to physical target tables related to the target structured query language sentences, determining selection sentences related to the second physical target tables in the target structured query language sentences according to each second physical target table, generating the second physical target tables in the target database according to fields related to the selection sentences, and finally executing the structured query language task indicated by the target structured query language sentences based on the physical source data tables and the physical target tables in the target database. According to the method provided by the embodiment of the application, the new physical target table can be determined according to the target structured query language statement, so that when a user issues the SQL task, even if the physical target table changes, the new physical target table obtained based on the target structured query language statement can be used, the calculation engine can accurately execute the SQL task according to the physical source data table and the new physical target table existing in the target database, the new physical target table does not need to be created in advance, the calculation engine can accurately execute the SQL task, and the execution robustness of the SQL task is improved.
Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the application, and other embodiments may be obtained according to these drawings to those skilled in the art.
FIG. 1 is a diagram showing a comparison of a target table/source table and a physical target table/physical source table;
FIG. 2 is a first flowchart of a method for executing a task in a structured query language according to an embodiment of the present application;
FIG. 3 is a second flowchart of a method for executing a task in a structured query language according to an embodiment of the present application;
FIG. 4 is a third flowchart of a method for executing a structured query language task according to an embodiment of the present application;
FIG. 5 is a fourth flowchart of a method for executing a structured query language task according to an embodiment of the present application;
FIG. 6 is a fifth flowchart of a method for executing a structured query language task according to an embodiment of the present application;
FIG. 7 is a schematic structural diagram of a module for implementing a method for executing a task in a structured query language according to an embodiment of the present application;
FIG. 8 is a workflow diagram of a SQL unified parsing sub-module according to an embodiment of the application;
FIG. 9 is a workflow diagram of a task SQL verification sub-module provided by an embodiment of the application;
FIG. 10 is a workflow diagram of a task SQL generating sub-module according to an embodiment of the application;
FIG. 11 is a schematic structural diagram of a task execution device for structured query language according to an embodiment of the present application;
Fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. Based on the embodiments of the present application, all other embodiments obtained by the person skilled in the art based on the present application are included in the scope of protection of the present application.
For a clearer description of the data query method provided by the present application, the following will explain the related terms referred to herein:
SQL, the full scale Structured Query Language, i.e., the structured query language, is a standard programming language that is specifically used to manage and manipulate databases, allowing users to perform queries, updates, and management of data.
SQL execution plan the database management system, upon receiving the SQL query statement, generates a detailed execution step, called SQL execution plan, based on the content queried by the SQL query statement. The SQL execution plan is the core for guiding the database management system to perform operations such as data query, update and management, and determines the efficiency and performance of the query.
A computing engine, a component or framework for processing and executing computing tasks, provides real-time or near real-time data computing capabilities, and often has some metadata management capabilities itself.
Metadata, which is data describing attribute information of the data, is used for supporting functions of indicating storage positions, historical data, resource searching, file recording and the like.
The physical target table is also called a physical sink (output component in data flow processing) table, and refers to a specific data source table for outputting data from a calculation engine in a flow processing flow.
Physical source table, also known as physical source table, refers to a specific data source table of data input calculation engine in stream processing.
The stream processing task SQL is used for submitting the SQL of the task, and is generally composed of three parts, namely a source table, a sink table and a business logic SQL, wherein the sink table is a memory table mapped by a physical sink table of a real data source in a computing engine, the source table is a memory table mapped by a physical source table of the real data source in the computing engine, and the sink table and the source table are composed of table names, fields, field types, data source connection parameters and computing engine high-order parameters.
SQLNode also called SQL nodes are abstract syntax tree nodes of SQL sentences, and represent various constituent parts of the SQL sentences, such as select, from, where, join, etc. SQLNode is the product of the SQL parsing stage, which parses SQL statements into a tree structure for subsequent processing.
RelNode the abstract Node of the Relational expression, which represents the logic execution plan after query optimization, is called the Relational Node, and metadata information of the corresponding source table and the target table is introduced into RelNode.
Business SQL refers to SQL related to data stream processing logic in stream computation, and related to the task itself are CREATE VIEW (create view), select, and insert statements.
In the related art, when a query analysis task is executed, in order to simplify the flow of the query analysis, a physical source table and a physical target table are mapped in a computing engine to obtain a source table and a target table, and the source table and the target table are persisted in the computing engine, as shown in fig. 1, which is a comparison diagram of the target table/source table and the physical target table/physical source table, if the query is required to be performed based on data in the physical source table 1 and the physical source table 2 when the query analysis task is executed, to obtain a physical target table containing a query analysis result, then the source table 1 and the source table 2 corresponding to the physical source table 1 and the physical source table 2 respectively need to be created in the computing engine 100 according to the data in the physical source table, the target table corresponding to the physical target table is created in the computing engine 100, and the source table 1, the source table 2 and the target table are persisted in the computing engine 100, and when the query is executed, the computing engine 100 creates a view based on the source table 1 and the source table 2, and the query is required to be converted and synchronized to the physical target table.
However, if the user query analysis requirement changes, that is, the physical source table and the physical target table change, the source table and the target table are permanently stored in the computing engine, and the source table, the target table, and the new physical source table and the new physical target table stored in the computing engine are not identical, the computing engine cannot create a view according to the new physical source table and the new physical target table and perform query and analysis, so that the SQL task cannot be executed.
In order to improve the robustness of the execution of the SQL task, in a first aspect of the embodiment of the present application, a method for executing a structured query language task is provided, referring to fig. 2, fig. 2 is a first flowchart of the method for executing a structured query language task provided in the embodiment of the present application, where the method includes the following steps:
step S10, acquiring a target SQL statement aiming at a target database;
step S20, determining a physical target table involved in a target SQL statement as a first physical target table;
step S30, determining a target table which does not exist in the target database in each first physical target table as a second physical target table;
Step S40, for each second physical target table, determining a selection statement associated with the second physical target table in the target SQL statement, and generating the second physical target table in the target database according to the field related to the selection statement;
And step S50, executing the SQL task indicated by the target structured query language statement based on the physical source data table and the physical target table existing in the target database.
According to the method provided by the embodiment of the application, the new physical target table can be determined according to the target structured query language statement, so that when a user issues the SQL task, even if the physical target table changes, the new physical target table obtained based on the target structured query language statement can be used, the calculation engine can accurately execute the SQL task according to the physical source data table and the new physical target table existing in the target database, the new physical target table does not need to be created in advance, the calculation engine can accurately execute the SQL task, and the execution robustness of the SQL task is improved.
The above step S10 to step S50 will be described below:
In step S10, the target SQL statement is a statement constructed based on a task that the user wants to perform for a certain database, and according to an operation that the user wants to perform, the SQL statement may include DDL (Data Definition Language ) statements such as a statement for Creating (CREATE), modifying (ALTER) and Deleting (DROP) objects of the database, tables, views, indexes, etc., DML (Data Manipulation Language ) statements such as a statement for performing operations of Inserting (INSERT), updating (UPDATE) and Deleting (DELETE) data in the database, DQL (Data Query Language ) statements whose core statement is a SELECT (SELECT) statement for retrieving data from the database, DCL (Data Control Language ) statements for controlling access rights of the database user.
Databases include, but are not limited to, one or more of MySQL (Relational Database MANAGEMENT SYSTEM ), kafka (an open source stream processing platform), oracle (an information management software), elastosearch (search server), postgreSQL (an object-relational database management system), starRocks (a database system based on massively parallel processing technology).
It will be appreciated that there are multiple tables containing raw data in the target database, i.e., physical source tables, also known as physical source data tables, and that SQL statements are actually statements built based on tasks that the user wants to perform with respect to the source data tables present in the target database. Illustratively, assume a database storing a physical source table named "sales_orders" storing detailed information of all sales orders, such as order numbers, product information, sales dates, etc., through SQL statement "SELECTFROM samples_descriptors "can extract all the data in the physical source table.
In step S20, the physical target table related to the SQL statement may be stored in the database in advance, or may be obtained when the SQL task indicated by the SQL statement is executed.
However, if the physical target table is obtained when the SQL task indicated by the SQL statement is executed, the physical target table may not be present in the target database, and the physical target table needs to be created in the target database.
After determining the target table that does not exist in the target database as the second physical target table in step S30, an actual second physical target table is created in the target database for each second physical target table in step S40.
In step S40, the selection statement associated with the second physical object table refers to a selection statement concerning the content of the second physical object table among the statements. Since the basic grammar of the selection statement is "SELECT [ column_list ] FROM [ table_name ]", where [ column_list ] is a list of columns to be selected, one or more column names may be used, and wild cards may also be usedTo indicate that all columns are selected, [ table_name ] is the name of the physical source table from which the data is to be retrieved.
The field referred to by the SELECT statement is "[ column_list ]" in the SELECT statement, and since the SELECT statement is a list of columns to be selected from the physical source table as a part of the contents of the second physical destination table, if the second physical destination table does not exist in the destination database, the structure of the second physical destination table can be determined based on the contents retrieved by the SELECT statement from the physical source table, that is, the second physical destination table is generated in the destination database.
For example, if the selection statement associated with the second physical object table is "select_id FROM orders1", "select_ amount FROM orders2", and "order_id" indicates an order number and "order_count" indicates an order amount, the second physical object table includes the order number and the order amount.
In step S50, the SQL task, i.e., a task that the user wants to execute with respect to the source data table existing in the target database, includes a data query task (SQL statement is a SELECT statement), a data insertion task (SQL statement is an INSERT statement), a data UPDATE task (SQL statement is an UPDATE statement), a data deletion task (SQL statement is a DELETE statement), and the like.
There may be only one or more physical target tables involved in the target SQL statement. In the case where the target SQL statement contains multiple SQL statements, there may be more first physical target tables involved in the multiple SQL statements, and determining the physical target tables involved in the SQL statements is time consuming.
Based on this, in a possible implementation manner, as shown in fig. 3, a second flowchart of a structured query language task execution method provided by an embodiment of the present application, the step S20 specifically includes:
Step S201, converting the single SQL statement into SQLNode, SQLNode aiming at each single SQL statement in the target SQL statement, wherein each node in SQLNode, SQLNode is used for representing one element in the single SQL statement;
step S203, each SQLNode is traversed, each node representing the physical object table is determined, and the physical object table represented by each determined node is used as the first physical object table.
The target SQL statement may include only a single SQL statement or may include a plurality of SQL statements. In the case where the target SQL statement is a compound statement composed of a plurality of SQL statements, it is necessary to disassemble the target SQL statement into a plurality of single SQL statements.
Where a single SQL statement refers to a statement that contains only one complete and independent SQL command that is executed at a time, the single SQL statement includes, but is not limited to, a single SELECT statement, a single INSERT statement, a single UPDATE statement, a single DELETE statement, a single CREATE statement. Illustratively, the SQL statement is DELETE FROM students WHERE age < 18, meaning that all student records with ages less than 18 in the "students" table are deleted, which is a single SQL statement and a single DELETE statement, which is used only for data deletion tasks.
Each node SQLNode is used to represent an element in a single SQL statement, and, taking the foregoing statement "DELETE FROM students WHERE age < 18" as an example, "DELETE", "FROM", "WHERE" respectively represent an element of the SQL statement.
By adopting the method of the embodiment of the application, the SQL sentence is converted into SQLNode, and as each node of SQLNode is used for representing one element in a single SQL sentence, the physical target table related to the SQL sentence can be determined and obtained only by traversing the node representing the physical target table in SQLNode, and sentence-by-sentence analysis is not required for each target SQL sentence.
CREATVIEW (create view) statements are used to create a view in the database that is a virtual table based on the results of the SQL query. Since the view does not store the actual data, but only generates data dynamically according to the SQL query, the grammar of CREATVIEW statements must be correct and correct to ensure that the view can be properly created and the query results returned.
In the related art, CREATVIEW sentences are executed in a computing engine, and after a memory table is built, the DML sentences referencing the view can be checked.
Since the basic syntax for creating a view sentence is CREATE VIEW VIEW _ NAME AS SELECT column1, column 2. WHERE view_name is the name of the view that the user wants to create, SELECT column1, column 2. FROM table_name formulates that the view is based on that table query, WHERE condition is an optional filtering operation.
The field behind the field 'CREATE VIEW VIEW _name AS' in the created view sentence is actually a SELECT sentence, and checking the sentence behind the field 'CREATE VIEW VIEW _name AS' in the view sentence is equivalent to checking the created view sentence when checking whether the created view sentence is error-free. Because the view is essentially a virtual table based on SQL query results, SELECT statements are typically included in creating the view statements.
While checking the SELECT statement or INSERT statement does not rely on the calculation engine, in one possible implementation, the create view statement may be rewritten as a SELECT statement prior to performing step S203 described above.
Specifically, as shown in fig. 4, a third flowchart of a method for executing a structured query language task according to an embodiment of the present application is shown, where the method further includes:
Step S2021, determining SQLNode including creating a view node;
wherein the view creation node is a node for representing creation of view class;
Step S2022, converting the created view nodes into grammar subtrees at each SQLNode determined;
Wherein each node in the grammar sub-tree is used for representing each insertion class and selection class of the creation view class represented by the reference creation view node, namely an INSERT class and a SELECT class.
By adopting the method provided by the embodiment of the application, in SQLNode containing the view creating node, the view creating node is converted into the grammar subtree for representing the SELECT statement, so that the capability of analyzing and checking the view creating statement without depending on a calculation engine is realized, and the flexibility and efficiency of SQL query are improved.
It will be appreciated that the data types in the physical source tables and physical destination tables stored in the destination database are different from the data types that the compute engine can identify, and therefore, the physical source tables and physical destination tables need to be converted into the data types that the compute engine can identify before the compute engine performs the SQL task.
Referring to fig. 5, which is a fourth flowchart of a method for executing a structured query language task according to an embodiment of the present application, the step S50 of executing the structured query language task indicated by the target structured query language sentence specifically includes:
Step S501, converting the single SQL statement into RelNode, relNode aiming at each single SQL statement in the target SQL statement, wherein each node in RelNode, relNode is used for representing one step in the SQL subtask indicated by the single SQL statement;
A single SQL statement may be converted to RelNode, e.g., calcite (a data query engine may convert any data query to an SQL-based query) using existing tools. Specifically, a single SQL statement may first be converted SQLNode using the Calcite SQL parser, and since the structure of the physical object table has been determined based on the above and the structure of the physical source table is known, calcite may access the metadata of the physical source table and physical object table for validation, and after validation pass, use SqlToRelConverter (a key component of Calcite) of Calcite to convert SQLNode to logic RelNode.
Step S503, determining a logic data source table and a logic target table related in each RelNode as a first logic data source table and a first logic target table;
Step S505, according to the preset mapping relation between the first field and the second field and the mapping relation between the first field type and the second field type, the first field existing in the logic data source table and the logic target table is adjusted to be the corresponding second field, and the existing first field type is adjusted to be the corresponding second field type, so that a second logic data source table and a second logic target table are obtained;
the first field and the first field type are fields and field types which are not supported by the computing engine, and the second field type are fields and field types which are supported by the preset computing engine;
and S507, executing the SQL task indicated by the target SQL statement according to the second logic data source table and the second logic target table by the computing engine.
By adopting the method of the embodiment of the application, the field types in the logic data source table and the logic target table related in RelNode are adjusted so that the field types in the logic data source table and the logic target table are field types which can be identified by the computing engine, and the problem that SQL tasks cannot be normally executed because the field types in the logic data source table and the logic target table related in RelNode are types which are not identified by the computing engine is avoided.
It can be understood that the mapping relation between the field types supported by the computing engine and the field types that cannot be identified by the computing engine is pre-stored in the target database, and after the logical data source table and the logical target table are obtained, parameters in metadata information related to the table can be adjusted to parameters supported by the computing engine according to the mapping relation pre-stored in the target database.
Referring to fig. 6, fig. 6 is a fifth flowchart of a method for executing a task in a structured query language according to an embodiment of the present application, where the method further includes the following steps:
step S5041, aiming at each first logic data source table and each first logic target table, acquiring metadata information related to the table in a target database as first metadata information;
step S5042, according to a mapping rule between a preset first parameter and a preset second parameter, the first parameter existing in the first metadata information is adjusted to a corresponding second parameter, and second metadata information is obtained;
the first parameter is a parameter which is not supported by the computing engine, and the second parameter is a parameter which is supported by the computing engine;
the step S507 specifically includes:
In step S5071, the structured query language task indicated by the target structured query language statement is executed by the computing engine according to the second logical data source table, the second logical target table, and the second metadata information.
By adopting the method of the embodiment of the application, the logical data source table and the logical target table can be converted into the data types which can be identified by the computing engine because the mapping rule between the parameters supported by the computing engine and the parameters not supported by the computing engine is preset, and the SQL task can be accurately executed when the physical source table and the physical target table are changed.
Based on the foregoing, the method for executing the structured query language task provided by the embodiment of the present application may be considered to be completed in five modules respectively. Fig. 7 is a schematic structural diagram of a module for implementing a task execution method of a structured query language according to an embodiment of the present application, where the module includes an SQL unified parsing sub-module 11, an SQL verification sub-module 12, a task SQL generation sub-module 13, a task submission sub-module 14, and a data collection and storage sub-module 15. The SQL unified analysis sub-module 11, the SQL verification sub-module 12 and the task SQL generation sub-module 13 form a metadata management unit 1 which is arranged outside the computing engine, the metadata management unit 1 can provide the capability of performing SQL analysis and verification independently of the computing engine, and can manage the metadata information of other data sources, so that a user does not need to pay attention to the problems of data mapping between the data sources and the computing engine, data parameter mapping of the data sources, whether a physical target table exists or not, and the like.
As can be seen from fig. 7, in the process of executing the SQL task, the SQL unified analysis sub-module 11, the SQL verification sub-module 12, the task SQL generation sub-module 13, and the task submission sub-module 14 continuously acquire data from the data acquisition and storage sub-module 15. In the embodiment of the application, in the process of executing SQL tasks by the SQL unified analysis sub-module 11, the SQL verification sub-module 12, the task SQL generation sub-module 13 and the task submission sub-module 14, firstly, the SQL unified analysis sub-module 11 executes SQL segmentation and SQL analysis to obtain SQLNode, then the SQL verification sub-module 12 executes sink table verification and sink table generation based on SQLNode to obtain RelNode, then the task SQL generation sub-module 13 executes field type mapping, metadata information mapping and high-order parameter mapping, converts a logic data source table and a logic target table related to RelNode into data supported by a computing engine, and finally, the task submission sub-module 14 executes task SQL verification, task parameter filling and SQL task and task parameter submission to the computing engine.
The following describes the SQL unified parsing sub-module 11, the SQL verification sub-module 12, the task SQL generation sub-module 13, the task submission sub-module 14, and the data acquisition and storage sub-module 15, respectively:
(1) Data acquisition and storage sub-module 15
The sub-module corresponds to the target database in the foregoing, the task generates the core data support of the sub-module, and metadata information of the data source (i.e. heterogeneous data source metadata in fig. 7), the mapping of field types of data source table information and computing engine data types, the mapping of table names, the mapping of connection parameters and other key rules (i.e. the mapping rule of the data source and the computing engine in fig. 7) and the high-level parameters preset by the user through the front-end interface (i.e. the high-level parameter information in fig. 7) are stored in the sub-module.
The information stored in the data acquisition and storage sub-module 15 is continuously stored and utilized during execution of the SQL task.
(2) SQL unified parsing sub-module 11
The SQL unified parsing sub-module 11 functions to parse one or more target SQL statements of input into an abstract syntax tree SQLNode. For example, CREATEVIEW statements would be converted to SQLCREATEVIEW classes, INSERT statements would be converted to SQLINSERT classes, and SELECT statements would be converted to SQLSELECT classes. The submodule has an SQL statement rewriting function, and can reconstruct CREATEVIEW statements, INSERT statements and SELECT statements.
The workflow of the SQL unified parsing sub-module 11 is shown in FIG. 8. First, a target SQL statement composed of a plurality of SQL statements submitted by a user (corresponding to the step S10) is acquired and finely divided into a plurality of single SQL statements, and each single SQL statement is parsed into SQLNode item by means of CALCITE PARSER (corresponding to the step S201), so as to obtain SQLNodelist corresponding to the target SQL statement. Then traversing whether the SQLNodelist corresponding to the target SQL statement contains SQLNode containing CREATVIEW statement, if not, directly temporarily storing SQLNodelist corresponding to the target SQL statement in the memory of the service program running the metadata management unit, clearing the service program after the SQL task is finished, if yes, further checking whether other SQLSELECT statement or SQLINSERT statement refers to the VIEW, if yes, reconstructing CREATVIEW statement into sub-query statement (SQLSELECT statement or SQLINSERT statement), merging with corresponding SQLSELECT statement or SQLINSERT statement (corresponding to step S2021-step S2022), finally obtaining SQLSELECT statement set or SQLINSERT statement set, if not, directly temporarily storing SQLNodelist corresponding to the target SQL statement in the memory of the service program running the metadata management unit, clearing the service program after the SQL task is finished.
(3) Task SQL verification sub-module 13
See FIG. 9 for a workflow diagram of a task SQL check sub-module. After receiving SQLSELECT statement sets or SQLINSERT statement sets output by the SQL unified analysis sub-module 11, the sub-module uses SqlVisitor visitors to traverse all physical target tables in SQLNodelist (corresponding to the step S20 described above), combines the metadata information of the data sources of the data acquisition and storage sub-module 15 to determine whether the physical target tables actually have data sources (corresponding to the step S30 described above), if so, calls an interface to acquire the metadata information of all physical target tables and physical source tables from the data acquisition and storage sub-module 15, performs SQL verification, and then converts SQLNode into RelNode (corresponding to the step S501 described above).
If not, then the fields in SQLINSERT statements for the physical target table need to be removed, the SQLINSERT statement converted to SQLSELECT statement, and the SQLSELECT statement converted to RelNode in combination with the metadata information of the physical source table. Finally, generating a table construction statement of the target table by utilizing the metadata information of the data source in the data acquisition and storage sub-module 15, and after the user determines to execute the SQL, the physical target table in the data acquisition and storage sub-module 15 is already present, and calling an interface to acquire the metadata information of all the physical target tables and the physical source tables from the data acquisition and storage sub-module 15, and converting SQLNode into RelNode.
And finally, temporarily storing all RelNode sets in a memory, and cleaning the SQL task after the SQL task is generated subsequently.
(4) Task SQL generating sub-module 13
See FIG. 10 for a workflow diagram of the task SQL generating sub-module. The submodule acquires the logical data source table and the logical target table (corresponding to the step S503) related in RelNode according to the RelNode set output by the task SQL generating submodule 13, and then adjusts the fields and the field types existing in the logical data source table and the logical target table to the fields and the field types which can be identified by the calculation engine (corresponding to the step S505) so as to ensure that the data source data can be correctly identified by the calculation engine. Then, metadata information related to the table in the data collection and storage sub-module 15, such as parameters in an SQL statement (corresponding to the foregoing step S5041), is obtained, and based on a preset mapping rule, the metadata information of the data source is mapped (i.e. the data source data information mapping in fig. 10) and converted into a writing method specified by the computing engine (corresponding to the foregoing step S5042), so as to generate parameters required by the computing engine to connect to the corresponding data source, for example, a with parameter in the link, a parameter connected to the mysql database, url (uniform resource locator ), driver (driver), and the like. And finally, converting the high-order parameters supplemented by the user obtained by the front end into an engine specified writing method according to a preset mapping rule, supplementing the engine specified writing method into the task SQL, generating the high-order parameters of the task SQL, and further obtaining the final complete task SQL.
Wherein the high-order parameters are tuning parameters located between the compute engine and the data source that do not affect the connection of the compute engine to the data source. The high-order parameters are usually default values, but some users can finely adjust the parameters according to different task requirements. The high-order parameters are divided into source table high-order parameters and target table high-order parameters of the data source. The source table high-order parameters of the data source include a scan partition number parameter (scan. Partition. Num), a scan partition column parameter (scan. Partition. Column), a cache policy parameter (cache), and the target table high-order parameters include an insertion parallelism (sink. Parallel), a cache refresh interval time (sink. Buffer-flush. Interval), a cache maximum storage line number (sink. Buffer-flush. Max-rows), and the like. It will be appreciated that the higher order parameters of different data sources may be different.
(5) Task submission submodule 14
After the task SQL is generated, the task SQL is returned to the front end for verification by the user, and the computing engine also performs verification on the task SQL through the verification interface. After the verification is completed, specific task parameters are set for users to perform task optimization in different degrees before each task SQL is submitted, and after the users perform task parameter configuration through a front-end interface, the sub-module submits the tasks to a computing engine, and the computing engine executes the tasks.
Corresponding to the foregoing first aspect, a second aspect of the embodiment of the present application provides a structured query language task execution device, and as shown in fig. 11, the structured query language task execution device provided by the embodiment of the present application includes:
an acquisition module 1101, configured to acquire a target structured query language sentence for a target database;
A first determining module 1102, configured to determine a physical target table involved in the target structured query language statement as a first physical target table;
a second determining module 1103, configured to determine, in each of the first physical target tables, a target table that does not exist in the target database, as a second physical target table;
A generating module 1104, configured to determine, for each of the second physical target tables, a selection statement associated with the second physical target table in the target structured query language statement, and generate the second physical target table in the target database according to a field related to the selection statement;
An execution module 1105, configured to execute the structured query language task indicated by the target structured query language statement based on the physical source data table and the physical target table existing in the target database.
According to the device provided by the embodiment of the application, the new physical target table can be determined according to the target structured query language statement, so that when a user issues the SQL task, even if the physical target table changes, the new physical target table obtained based on the target structured query language statement can be used, the calculation engine can accurately execute the SQL task according to the physical source data table and the new physical target table existing in the target database, the new physical target table does not need to be created in advance, the calculation engine can accurately execute the SQL task, and the execution robustness of the SQL task is improved.
In a possible implementation manner, the first determining module 1102 is specifically configured to:
converting the single structured query language statement into SQLNode for each single structured query language statement in the target structured query language statement, each node in SQLNode being used to represent an element in the single structured query language statement;
And traversing each SQLNode respectively, determining each node used for representing the physical target table, and taking the physical target table represented by each determined node as a first physical target table.
In one possible embodiment, the apparatus further comprises:
a third determining module, configured to determine SQLNode that includes a created view node before traversing each SQLNode separately, where the created view node is a node used to represent a created view class;
and the node conversion module is used for respectively converting the created view nodes into grammar subtrees at the determined SQLNode, wherein each node in the grammar subtrees is used for representing each insertion class and each selection class of the created view class represented by the created view node.
The embodiment of the application also provides an electronic device, as shown in fig. 12, including:
A memory 1201 for storing a computer program;
the processor 1202, when executing the program stored in the memory 1201, performs the following steps:
acquiring a target SQL statement aiming at a target database;
determining a physical target table involved in a target SQL statement as a first physical target table;
determining a target table which does not exist in the target database in each first physical target table as a second physical target table;
for each second physical target table, determining a selection statement associated with the second physical target table in the target SQL statement, and generating the second physical target table in the target database according to a field related to the selection statement;
Based on the physical source data table and the physical target table existing in the target database, executing the SQL task indicated by the target structured query language statement.
The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The Processor may be a general-purpose Processor including a central processing unit (Central Processing Unit, CPU), a network Processor (Network Processor, NP), etc., or may be a digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.
In yet another embodiment of the present application, a computer readable storage medium is provided, in which a computer program is stored, which when executed by a processor, implements the steps of any of the structured query language task execution methods described above.
In yet another embodiment of the present application, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the structured query language task execution method of any of the above embodiments.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a Solid state disk (Solid STATE DISK, SSD), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (10)

1. A method of structured query language task execution, the method comprising:
Acquiring a target structured query language statement aiming at a target database;
determining a physical target table involved in the target structured query language statement as a first physical target table;
Determining a target table which does not exist in the target database in each first physical target table as a second physical target table;
For each second physical target table, determining a selection statement associated with the second physical target table in the target structured query language statement, and generating the second physical target table in the target database according to a field related to the selection statement;
and executing the structured query language task indicated by the target structured query language statement based on the physical source data table and the physical target table existing in the target database.
2. The method of claim 1, wherein said determining a physical target table involved in the target structured query language statement as a first physical target table comprises:
converting the single structured query language statement into SQLNode for each single structured query language statement in the target structured query language statement, each node in SQLNode being used to represent an element in the single structured query language statement;
And traversing each SQLNode respectively, determining each node used for representing the physical target table, and taking the physical target table represented by each determined node as a first physical target table.
3. The method of claim 2, wherein prior to said traversing each of said SQLNode separately, said method further comprises:
determine SQLNode that includes a create view node, wherein the create view node is a node for representing a create view class;
And converting the created view node into a grammar subtree at each determined SQLNode, wherein each node in the grammar subtree is used for representing each insertion class and selection class referencing the created view class represented by the created view node.
4. The method of claim 1, wherein the executing the structured query language task indicated by the target structured query language statement comprises:
converting the single structured query language statement into RelNode for each single structured query language statement in the target structured query language statement, each node in RelNode being used to represent a step in a structured query language subtask indicated by the single structured query language statement;
Determining a logic data source table and a logic target table involved in each RelNode as a first logic data source table and a first logic target table;
according to a preset mapping relation between a first field and a second field and a mapping relation between a first field type and a second field type, adjusting the first field existing in the logic data source table and the logic target table to be corresponding second fields, and adjusting the first field type existing in the logic data source table and the logic target table to be corresponding second field types to obtain a second logic data source table and a second logic target table, wherein the first field and the first field type are fields and field types which are not supported by a computing engine, and the second field type are fields and field types which are supported by the computing engine;
And executing the structured query language task indicated by the target structured query language statement according to the second logic data source table and the second logic target table by the computing engine.
5. The method according to claim 4, wherein the method further comprises:
for each first logic data source table and each first logic target table, acquiring metadata information related to the table in a target database as first metadata information;
According to a preset mapping rule between a first parameter and a second parameter, the first parameter existing in the first metadata information is adjusted to be a corresponding second parameter, so that second metadata information is obtained, wherein the first parameter is a parameter which is not supported by the computing engine, and the second parameter is a parameter which is supported by the computing engine;
The executing, by the computing engine, the structured query language task indicated by the target structured query language statement according to the second logical data source table and the second logical target table, including:
And executing the structured query language task indicated by the target structured query language statement according to the second logic data source table, the second logic target table and the second metadata information by the computing engine.
6. A structured query language task execution device, the device comprising:
The acquisition module is used for acquiring target structured query language sentences aiming at the target database;
the first determining module is used for determining a physical target table involved in the target structured query language statement as a first physical target table;
The second determining module is used for determining a target table which does not exist in the target database in each first physical target table as a second physical target table;
the generation module is used for determining a selection statement related to the second physical target table in the target structured query language statement for each second physical target table, and generating the second physical target table in the target database according to a field related to the selection statement;
And the execution module is used for executing the structured query language task indicated by the target structured query language statement based on the physical source data table and the physical target table existing in the target database.
7. The apparatus of claim 6, wherein the first determining module is specifically configured to:
converting the single structured query language statement into SQLNode for each single structured query language statement in the target structured query language statement, each node in SQLNode being used to represent an element in the single structured query language statement;
And traversing each SQLNode respectively, determining each node used for representing the physical target table, and taking the physical target table represented by each determined node as a first physical target table.
8. The apparatus of claim 7, wherein the apparatus further comprises:
A third determining module, configured to determine SQLNode that includes a create view node before traversing each SQLNode, where the create view node is a node used to represent a create view class;
and the node conversion module is used for respectively converting the created view nodes into grammar subtrees at the determined SQLNode, wherein each node in the grammar subtrees is used for representing each insertion class and each selection class of the created view class represented by the created view node.
9. An electronic device, comprising:
A memory for storing a computer program;
A processor for implementing the method of any of claims 1-5 when executing a program stored on a memory.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the method of any of claims 1-5.
CN202411945264.4A 2024-12-26 2024-12-26 A structured query language task execution method, device and electronic device Pending CN119357243A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411945264.4A CN119357243A (en) 2024-12-26 2024-12-26 A structured query language task execution method, device and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411945264.4A CN119357243A (en) 2024-12-26 2024-12-26 A structured query language task execution method, device and electronic device

Publications (1)

Publication Number Publication Date
CN119357243A true CN119357243A (en) 2025-01-24

Family

ID=94318280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411945264.4A Pending CN119357243A (en) 2024-12-26 2024-12-26 A structured query language task execution method, device and electronic device

Country Status (1)

Country Link
CN (1) CN119357243A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704479A (en) * 2019-09-12 2020-01-17 新华三大数据技术有限公司 Task processing method and device, electronic equipment and storage medium
CN110928869A (en) * 2018-09-04 2020-03-27 深圳市超脑云信息技术有限公司 Data warehouse table self-growing method, terminal and medium
US20200334242A1 (en) * 2019-04-16 2020-10-22 Snowflake Inc. Automated maintenance of external tables in database systems
CN116303542A (en) * 2023-03-23 2023-06-23 京东科技控股股份有限公司 SQL sentence execution method and device, computer equipment and storage medium
CN118260290A (en) * 2024-03-04 2024-06-28 中国第一汽车股份有限公司 Data warehouse development method and device and electronic equipment
CN119025547A (en) * 2024-08-30 2024-11-26 北京洛塔信息技术有限公司 A structured query language conversion method, system, device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110928869A (en) * 2018-09-04 2020-03-27 深圳市超脑云信息技术有限公司 Data warehouse table self-growing method, terminal and medium
US20200334242A1 (en) * 2019-04-16 2020-10-22 Snowflake Inc. Automated maintenance of external tables in database systems
CN110704479A (en) * 2019-09-12 2020-01-17 新华三大数据技术有限公司 Task processing method and device, electronic equipment and storage medium
CN116303542A (en) * 2023-03-23 2023-06-23 京东科技控股股份有限公司 SQL sentence execution method and device, computer equipment and storage medium
CN118260290A (en) * 2024-03-04 2024-06-28 中国第一汽车股份有限公司 Data warehouse development method and device and electronic equipment
CN119025547A (en) * 2024-08-30 2024-11-26 北京洛塔信息技术有限公司 A structured query language conversion method, system, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭玉栋等: "SQL语言查询优化方案探究", 晋中学院学报, 31 July 2006 (2006-07-31), pages 18 - 20 *

Similar Documents

Publication Publication Date Title
US11907247B2 (en) Metadata hub for metadata models of database objects
JP6720641B2 (en) Data constraint of multilingual data tier
US8615526B2 (en) Markup language based query and file generation
US20030105745A1 (en) Text-file based relational database
US20100017395A1 (en) Apparatus and methods for transforming relational queries into multi-dimensional queries
US20050010550A1 (en) System and method of modelling of a multi-dimensional data source in an entity-relationship model
US9390127B2 (en) RDF graphs made of RDF query language queries
CN106934062A (en) A kind of realization method and system of inquiry elasticsearch
US20210209098A1 (en) Converting database language statements between dialects
US7792851B2 (en) Mechanism for defining queries in terms of data objects
US11561976B1 (en) System and method for facilitating metadata identification and import
US8073843B2 (en) Mechanism for deferred rewrite of multiple XPath evaluations over binary XML
US20230091845A1 (en) Centralized metadata repository with relevancy identifiers
CN110019314A (en) Dynamic data packaging method, client and server-side based on data item analysis
US7693857B2 (en) Clinical genomics merged repository and partial episode support with support abstract and semantic meaning preserving data sniffers
CN114880483A (en) A metadata knowledge graph construction method, storage medium and system
CN114116767A (en) Method and device for converting SQL (structured query language) query statement of database
CN115905164A (en) Identification and import of extended metadata for database artifacts
US20200320069A1 (en) Hybrid compilation framework for arbitrary ad-hoc imperative functions in database queries
JP2024504556A (en) Systems and methods for accessing data entities managed by a data processing system
US20050060307A1 (en) System, method, and service for datatype caching, resolving, and escalating an SQL template with references
CN119166740A (en) Knowledge base construction method, data processing method, device, storage medium and program product
EP4170516A1 (en) Metadata elements with persistent identifiers
US20240134849A1 (en) Virtual access to parameterized data objects
US10331715B2 (en) Metadata enrichment with a keyword definition editor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination