US12475088B2 - Data quality check using a structured query language hint - Google Patents
Data quality check using a structured query language hintInfo
- Publication number
- US12475088B2 US12475088B2 US18/362,882 US202318362882A US12475088B2 US 12475088 B2 US12475088 B2 US 12475088B2 US 202318362882 A US202318362882 A US 202318362882A US 12475088 B2 US12475088 B2 US 12475088B2
- Authority
- US
- United States
- Prior art keywords
- query
- database
- quality check
- data
- data quality
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24535—Query rewriting; Transformation of sub-queries or views
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24537—Query rewriting; Transformation of operators
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24542—Plan optimisation
Definitions
- data quality checks processes of evaluating a quality, accuracy, completeness, consistency, and/or validity of data stored in the database, such as within one or more target data tables—may be implemented. By performing data quality checks, database administrators can identify and rectify data issues, improve data integrity, and maintain a high level of data quality.
- a query optimizer of a database management system may receive a data quality check instruction as part of a database query statement for a database query.
- the query optimizer may modify a query execution plan for the database query based on the data quality check instruction so that completion of the database query is dependent on a result of a data quality check defined by the data quality check instruction.
- the database management system may execute the modified query execution plan.
- FIG. 1 is an illustration of a digital medium environment in an example implementation that is operable to employ data quality check techniques described herein.
- FIG. 2 depicts an example implementation showing operation of the query optimizer and the execution engine of the database management system of FIG. 1 in greater detail.
- FIG. 3 is a flow diagram depicting an algorithm as a step-by-step procedure in an example of implementing a data quality check using a query hint.
- FIG. 4 is a flow diagram depicting an algorithm as a step-by-step procedure in an example of implementing a data quality check for a single record in a single table of a database while executing a database query.
- FIG. 5 is a flow diagram depicting an algorithm as a step-by-step procedure in an example of implementing a duplicate check of a table of a database while executing a database query.
- FIG. 6 is a flow diagram depicting an algorithm as a step-by-step procedure in an example of implementing a data quality check for records duplication while executing a join operation of a database query.
- FIG. 7 is a flow diagram depicting an algorithm as a step-by-step procedure in an example of implementing a data quality check for an overlapping record between two target data tables while executing a database query.
- FIG. 8 is a flow diagram depicting an algorithm as a step-by-step procedure in an example of implementing a data quality check including a scalar subquery while executing a database query.
- FIG. 9 illustrates an example system including various components of an example device that can be implemented as any type of computing device as described and/or utilized with reference to FIGS. 1 - 8 to implement embodiments of the techniques described herein.
- Data quality impacts the reliability, usability, and effectiveness of data stored in a database.
- data quality refers to an accuracy, completeness, consistency, timeliness, and relevance of the data.
- Data quality checks may be implemented in order to assess the data quality of one or more data tables of the database.
- Existing techniques for performing data quality checks of a database include using dedicated scripts (e.g., SQL statements) to perform the data quality check that are separate from other database queries.
- the data quality check script may be integrated into the middle of an extract, transform, load (ETL) pipeline.
- ETL extract, transform, load
- the data quality check script may operate as part of an isolated pipeline. In both of these examples, the data quality check runs as an individual job in the pipeline.
- performing a data quality check as an individual job results in wasted resources by rescanning tables that are accessed via dependent jobs.
- the data quality check may run after a dependent job completes. This increases processing times as well as computing resource usage.
- inputting the data quality check into the middle of the ETL pipeline can block the pipeline if the data quality check fails, it mixes the data quality check rules with ETL logic in a way that is complicated and disruptive of data processing flows.
- running the data quality check in an isolated pipeline does not block ETL jobs from completing in response to the data quality check failing, thus enabling inaccurate or unreliable results to be generated.
- these script-based data quality checks are not reusable. As such, new scripts are generated for each data quality check, which further increases processing times and negatively impacts data processing efficiency.
- a data quality check using a structured query language hint is described.
- This technique supports the use of reusable data quality check rules that are identified in a hint within a query, such as based on a keyword or other identification in the hint.
- a hint is a comment or annotation that provides additional information to influence generation of a query execution plan by a query optimizer.
- the query optimizer receives the query, which includes a data quality check hint, identifies the specific data quality check rules referenced in the hint, and constructs the query execution plan by loading configurable logic and parameters for the specified data quality check rules.
- the data quality check and the query itself are executed together, via a single query execution plan.
- processing times and computing resource usage are decreased compared to when the data quality check is executed as a stand-alone script that is separate from the query.
- the data quality check rules are configured to add a condition to the query execution plan that makes execution of the query dependent on a result of the data quality check.
- the data quality check rules add a conditional clause to the query execution plan so that the query execution plan completes when the data quality check succeeds and fails when the data quality check fails.
- the techniques described herein block other data retrieval and/or manipulation operations specified in the query (e.g., that are not part of the data quality check) from being performed on inaccurate or unreliable data. As a result, data accuracy is efficiently maintained without disrupting data processing flows.
- the data quality check rule can be efficiently configured, e.g., via a database management user interface.
- a database administrator may define or otherwise select, via the database management user interface, parameters that specify the logic and condition(s) of respective data quality check rules in order to differentiate high-quality data (e.g., data that pass the data quality check) from low-quality data (e.g., data that fail the data quality check).
- the database management user interface is configured to enable the database administrator to define new check rules, adjust the parameters of existing check rules, to activate or deactivate specific check rules, and so forth.
- the data quality check rules may be stored in a storage device that is accessible by the query optimizer so that the query optimizer retrieves a most recently updated version of a given data quality check rule while generating the query execution plan.
- a plurality of different examples of data quality check rules are described herein that perform different types of data quality checks, including a record check instruction, a duplicate check instruction, a duplicate join instruction, a reference check instruction, and a scalar query instruction.
- the record check instruction includes a condition that a target record is present and not empty (e.g., null) in a target data table. As such, the data quality check succeeds in response to the target record being present in the target data table and fails in response to the target record being absent from the target data table.
- the duplicate check instruction includes a condition that the target data table has unique records, without duplicates. As such, the data quality check succeeds in response to no duplicate records being found in the target data table and fails in response to duplicate records being found in the target data table.
- the duplicate join instruction includes a condition that a projected join result of two target data tables has unique records, without duplicates. As such, the data quality check succeeds in response to no duplicate records being found in the projected join result and fails in response to duplicate records being found in the projected join result.
- the reference check instruction includes a condition that a reference from a first data table is present in a second data table. As such, the data quality check succeeds in response to the reference from the first data table being present in the second data table and fails in response to the reference from the first data table being absent from the second data table.
- the scalar query instruction includes a condition that a scalar generated via a pre-programmed subquery coverts to a “true” Boolean according to pre-defined conversion rules.
- the data quality check succeeds in response to the scalar converting to a “true” Boolean (e.g., a first value of a binary) and fails in response to the scalar converting to a “false” Boolean (e.g., a second value of the binary) according to the pre-defined conversion rules.
- a “true” Boolean e.g., a first value of a binary
- a “false” Boolean e.g., a second value of the binary
- FIG. 1 is an illustration of a digital medium environment 100 in an example implementation that is operable to employ data quality check techniques described herein.
- the illustrated environment 100 includes a service provider system 102 , a computing device 104 , and a plurality of client devices 106 that are communicatively coupled, one to another, via a network 108 .
- Computing devices that implement the service provider system 102 , the computing device 104 , and the client devices 106 are configurable in a variety of ways.
- a computing device for instance, is configurable as a desktop computer, a laptop computer, a mobile device (e.g., assuming a handheld configuration such as a tablet or mobile phone), and so forth.
- a computing device ranges from full resource devices with substantial memory and processor resources (e.g., personal computers, game consoles) to a low-resource device with limited memory and/or processing resources (e.g., mobile devices).
- a computing device is also representative of a plurality of different devices, such as multiple servers utilized by a business to perform operations “over the cloud” as illustrated for the service provider system 102 and as described with respect to FIG. 9 .
- the service provider system 102 includes an executable service platform 110 .
- the executable service platform 110 is configured to implement and manage access to digital services 112 “in the cloud” that are accessible by the client devices 106 via the network 108 .
- the executable service platform 110 provides an underlying infrastructure to manage execution of the digital services 112 , e.g., through control of underlying computational resources.
- the executable service platform 110 supports numerous computational and technical advantages, including an ability of the service provider system 102 to readily scale resources to address wants of an entity associated with the client devices 106 .
- cloud computing instead of incurring an expense of purchasing and maintaining proprietary computer equipment for performing specific computational tasks, cloud computing provides the client devices 106 with access to a wide range of hardware and software resources so long as the client has access to the network 108 .
- the digital services 112 can take a variety of forms. Examples of digital services include social media services, document management services, storage services, media streaming services, content creation services, productivity services, digital marketplace services, auction services, and so forth.
- the digital services 112 include a database management system 114 .
- the database management system 114 includes functionality for the creation, organization, management, and manipulation of at least one database in a manner that enables users and applications to store, retrieve, and manipulate data efficiently and accurately.
- the database management system 114 performs database management tasks, including the execution of data quality checks, according to steps of one or more algorithms and is thus configured as a special-purpose machine.
- execution of the digital services 112 by the executable service platform 110 generates data 116 , which are illustrated as being stored in a storage device 118 , e.g., a datastore of the database management system 114 .
- the data 116 describes execution of the digital services 112 in this example as well as values of events that occur during this execution.
- the data 116 may be generated, for instance, by one or more application domains 120 , non-limiting examples of which include a user experience domain 122 , a user behavior domain 124 , and a search and recommendation domain 126 .
- the data 116 may also be retrieved from the storage device 118 for use by the one or more application domains 120 , e.g., in response to a query.
- the data 116 are used by the search and recommendation domain 126 to generate recommendations of items of digital content.
- the recommendations are configurable in a manner similar to performing a search based on a search query to locate the items of digital content.
- Use of the search result as a recommendation supports a variety of usage scenarios, including digital audio recommendations, digital video recommendations, auction items, and so forth.
- the storage device 118 is, or includes, a database 128 .
- a database 128 is a structured collection of data that is organized to enable efficient storage and retrieval of the data 116 .
- the database 128 is organized as a plurality of data tables, shown in FIG.
- Each data table includes rows 132 (also referred to as “records”) and columns 134 (also referred to as “fields”), shown with respect to a single data table in FIG. 1 for illustrative clarity, to further organize the data 116 stored therein.
- the database 128 may include other data structures, e.g., indexes, in addition to data tables.
- the database management system 114 is configured to receive a query 136 , which includes a request or command issued to retrieve or manipulate data stored in the database 128 .
- the query 136 is an expression written in a query language, such as structured query language (SQL), that specifies the criteria and conditions for the desired data.
- SQL structured query language
- the query 136 may be used to extract information from the database 128 , filter data based on specific conditions, perform calculations, and/or combine data from multiple data tables.
- the query 136 is submitted by a user, such as will be elaborated below.
- the query 136 is submitted by an application, such as via one of the application domains 120 .
- the query 136 may be submitted as part of an extract, transform, load (ETL) process used to extract the data 116 from various sources, transform it into a suitable format, and load it into the database 128 .
- the ETL process may be user-initiated or automated via a schedule (e.g., at a pre-determined time interval or time of day) and/or in response to a pre-determined trigger event, e.g., a data arrival event and/or a data file modification event.
- a pre-determined trigger event e.g., a data arrival event and/or a data file modification event.
- the query 136 may be generated by other events and processes without departing from the scope of the described techniques.
- the query 136 is received by a query optimizer 138 of the database management system 114 .
- the query optimizer 138 analyzes the query 136 and determines the most efficient way to fulfill the query 136 , such as to retrieve or manipulate the data 116 specified in the query 136 .
- This analysis produces a query execution plan, which outlines steps that the database management system 114 will take to execute the query 136 and, for example, retrieve the data 116 from the database 128 .
- the query execution plan includes a series of steps (e.g., operators) that are executed in a specific order to retrieve the data 116 .
- Each operator in the query execution plan corresponds to a specific task, such as filtering data or joining tables.
- the query optimizer 138 determines the order of these operators based on the estimated computational cost of each operator and available resources of the service provider system 102 .
- the query 136 includes a data quality check hint 140 .
- the data quality check hint 140 is a subquery of the query 136 that is added to the query 136 as a comment, e.g., a hint.
- hints are additional instructions that can be added to the query 136 to provide guidance to the query optimizer 138 on how to execute the query 136 .
- the data quality check hint 140 influences the query execution plan.
- the data quality check hint 140 includes instructions for injecting a data quality check into the query execution plan, as will be further described with respect to FIG. 2 .
- the data quality check hint 140 may identify, or may be used by the query optimizer 138 to identify, a data quality check instruction (also referred to as a “check rule” herein) from among a plurality of data quality check instructions 142 stored in a storage device 144 .
- a data quality check instruction also referred to as a “check rule” herein
- Individual check rules of the plurality of data quality check instructions 142 may have different logic and/or variables with respect to each other.
- the storage device 144 is shown included in the database management system 114 , it is to be appreciated that the storage device 144 may be remote from other components of the database management system 114 , such as located in a server that is remote from the database 128 .
- the check rules of the plurality of data quality check instructions 142 include one or more conditions that, when satisfied, enable execution of the query 136 by an execution engine 146 , and, when not satisfied, block execution of the query 136 by the execution engine 146 .
- Each check rule of the plurality of data quality check instructions 142 may be a different type of data quality check that includes a different one or more conditions with respect to each other. In this way, the query 136 may not complete when the data quality check does not pass.
- success or failure of the query 136 itself is recorded in a data table, which may be included in the database 128 or in a different database. Specific examples of data quality check rules and conditions will be provided herein, for example, with respect to FIG. 2 .
- the query optimizer 138 in response to receiving the query 136 including the data quality check hint 140 , identifies (e.g., selects) and retrieves the appropriate check rule of the plurality of data quality check instructions 142 based on the check rule identified in the data quality check hint 140 (e.g., based on a keyword or name identification) and/or base on a context of the query 136 .
- the logic of the identified check rule is inserted into the query 136 such that the query 136 is enhanced and/or rewritten to include the corresponding data quality check.
- the query optimizer 138 retrieves a previously generated query execution plan that is stored in memory and used for executions of the same query 136 (e.g., the same SQL statement), and so the query optimizer 138 modifies this previously generated query execution plan to include the data quality check specified by the data quality check hint 140 . Additionally or alternatively, the query optimizer 138 newly generates at least parts of the query execution plan, including the data quality check specified by the data quality check hint 140 .
- the check rule executes along with the query 136 , computational efficiency is increased compared to performing a data quality check as a separate query. For example, an overall runtime and resource consumption is decreased compared with running a data quality check separately from the query 136 . Moreover, because the data quality check blocks completion of the query 136 by using the data quality check hint 140 according to the techniques described herein, the database 128 is maintained more accurately then when the data quality check is performed in a pipeline that is independent from the query 136 .
- the database management system 114 is configured to generate a database management user interface 148 , which is illustrated as accessed by the computing device 104 via the network 108 using a communication module 150 , e.g., a browser, a network-enabled application, or the like.
- a communication module 150 e.g., a browser, a network-enabled application, or the like.
- the database management user interface 148 is configured to receive inputs to specify, for example, a variable, a status (e.g., enabled or disabled), a condition, and/or a threshold of individual check rules of the plurality of data quality check instructions 142 , shown collectively as parameters 154 in FIG. 1 .
- these parameters 154 are usable as part of database management to support automated generation of the check rule logic (e.g., as part of a SQL statement and/or the query execution plan) by the database management system 114 , automatically and without user intervention.
- the parameters 154 may include variables, e.g., programmable variables.
- the variables are named placeholders that, once defined (e.g., by the user), enable the manipulation of values within a SQL statement and further enable the values to be passed between different parts of a SQL script, between different queries, and the like.
- a variable may specify an initial value, a data type, and/or a data storage location of a target value.
- the variables, along with the other parameters 154 are loaded from the storage device 144 (e.g., by the database management system 114 ) automatically and without user intervention based on the data quality check hint 140 specified in the query 136 .
- the database management user interface 148 is further usable to receive user input that submits the query 136 to the database management system 114 .
- a user may generate (e.g., manually input) or otherwise select the particular clauses, keywords, and parameters included in the query 136 , including the data quality check hint 140 .
- the database management user interface 148 may include functionality to guide the user through the selection of various data retrieval and manipulation functions and automatically generate the query 136 in the relevant query language based on the user's selections.
- the user may manually compose the query 136 in the relevant query language without guidance, or with reduced guidance, from the database management user interface 148 .
- the query 136 may be automatically generated by the one or more application domains 120 , such as mentioned above.
- FIG. 2 depicts an example implementation 200 showing operation of the query optimizer 138 and the execution engine 146 of the database management system 114 of FIG. 1 in greater detail.
- the following discussion describes techniques that are implementable utilizing the previously described systems and devices.
- the query optimizer 138 receives the query 136 , which includes the data quality check hint 140 .
- the data quality check hint 140 is one of a record check hint 202 , a duplicate check hint 204 , a duplicate join hint 206 , a reference check hint 208 , and a scalar query hint 210 . Ellipses denote that one or more other types of data quality hints may exist without departing from the scope of the described techniques.
- the query optimizer 138 performs a retrieve instruction operation 212 to retrieve a corresponding data quality check instruction of the plurality of data quality check instructions 142 from the storage device 144 .
- the plurality of data quality check instructions 142 include a record check instruction 214 , a duplicate check instruction 216 , a duplicate join instruction 218 , a reference check instruction 220 , and a scalar query instruction 222 .
- the retrieve instruction operation 212 includes retrieving the record check instruction 214 in response to the data quality check hint 140 being the record check hint 202 , retrieving the duplicate check instruction 216 in response to the data quality check hint 140 being the duplicate check hint 204 , retrieving the duplicate join instruction 218 in response to the query 136 being the duplicate join hint 206 , retrieving the reference check instruction 220 in response to the data quality check hint 140 being the reference check hint 208 , or retrieving the scalar query instruction 222 in response to the data quality check hint 140 being the scalar query hint 210 .
- Ellipses denote that one or more other check rules may exist in the plurality of data quality check instructions 142 .
- the different check rules of the plurality of data quality check instructions 142 include logic (e.g., instructions) for implementing a data quality check.
- the record check instruction 214 includes logic for checking a target record of a target data table, where the target record and the target data table are specified (e.g., as programmable variables) in the record check instruction 214 .
- a record check data quality check operation specified by the record check instruction 214 is satisfied in response to the target record being present in the target data table and is not satisfied in response to the target record being absent from the target data table.
- the duplicate check instruction 216 includes logic for checking whether duplicate records exist in a target data table specified in the duplicate check instruction 216 .
- a duplicate check data quality check operation specified by the duplicate check instruction 216 is satisfied, for example, in response to an absence of duplicate records in the target data table and is not satisfied in response to duplicate records being present in the target data table, as will be elaborated with respect to FIG. 5 .
- the duplicate join instruction 218 includes logic for anticipating records duplication during a join operation of two target data tables. For example, as will be elaborated with respect to FIG. 6 , a duplicate join data quality check operation specified by the duplicate join instruction 218 is satisfied in response to a join result of the two target data tables not including duplicate records, such as when the join result includes records that are unique with respect to each other. Conversely, the duplicate join instruction 218 is not satisfied in response to the join result of the two target data tables including duplicate records, such as when records in the join result are not unique with respect to each other.
- the reference check instruction 220 includes logic for determining whether two target data tables specified in the reference check instruction 220 can be joined based on overlapping records.
- a reference check data quality check operation is satisfied in response to a specified reference from a first target data table being present in a second target data table and is not satisfied in response to the specified reference being absent from the second target data table.
- the specified reference from the first target data table is a,b,c and the second target data table includes a,b (and not c)
- the reference check data quality check operation is not satisfied.
- the second target data table includes a,b,c or a,b,c,d, for instance, the reference check data quality check operation is satisfied.
- the scalar query instruction 222 includes logic for implementing a programmable subquery that returns a single value, which may be transformed into a Boolean value that indicates “true” or “false,” thus providing a binary output.
- the programmable subquery may be a complex instruction.
- the programmable subquery may retrieve a maximum or a minimum value from a target data table specified in the scalar query instruction 222 , check for the existence of specific data based on certain conditions (e.g., a value greater than zero or a non-zero threshold), retrieve a single value as an aggregate function applied to a column (e.g., a sum or average), or return a single value as another type of calculation.
- a scalar query data quality check specified by the scalar query instruction 222 is satisfied in response to the Boolean being “true” and is not satisfied in response to the Boolean being “false.”
- the query optimizer 138 After fetching the appropriate data quality check instruction 142 , the query optimizer 138 generates a query execution plan 224 based on the query 136 and the retrieved data quality check instruction specified in the data quality check hint 140 . Accordingly, the query execution plan 224 includes a data quality check 226 , such as the record check data quality check, the duplicate check data quality check, the duplicate join data quality check, the reference check data quality check, or the scalar query data quality check described above.
- a data quality check 226 such as the record check data quality check, the duplicate check data quality check, the duplicate join data quality check, the reference check data quality check, or the scalar query data quality check described above.
- the query execution plan 224 is a detailed strategy outlining the steps and operations the execution engine 146 will perform to retrieve and/or manipulate the data requested in the query 136 based on, for example, table sizes, available indexes, and/or other factors that affect execution time and resource usages in order to increase (e.g., maximize) efficiency and reduce (e.g., minimize) computing resource consumption.
- the query optimizer 138 generates the query execution plan 224 so that completion of the query execution plan 224 , and thus the query 136 itself, is dependent on a result of the data quality check 226 .
- the query execution plan 224 is modified compared to when the query 136 does not include the data quality check hint 140 .
- the data quality check 226 includes one or more conditional clauses within the query execution plan 224 that blocks further execution of the query execution plan 224 in response to the one or more conditional clauses not being satisfied and enables execution of the remaining steps of the query execution plan 224 in response to the one or more conditional clauses being satisfied.
- the execution engine 146 receives the query execution plan 224 from the query optimizer 138 and performs the operations specified in the query execution plan 224 directly and/or by managing computing resources of the database management system 114 during execution.
- the execution engine 146 employs access methods to scan tables and/or use indexes of the database 128 in order to access and/or manipulate the data stored therein, depicted as a data access operation 228 in FIG. 2 .
- the execution engine 146 generates an output 230 , which may include one or more of a query result 232 , a data quality check result 234 , and an exception 236 .
- the output 230 includes both of the query result 232 and the data quality check result 234 when the data quality check 226 is satisfied (e.g., succeeds) or includes the data quality check result 234 and the exception 236 , and not the query result 232 , when the data quality check 226 is not satisfied (e.g., fails) due to the dependence on the data quality check result 234 for query execution.
- the query result 232 may include data specified in the query 136 , results of the data manipulation operations specified in the query 136 , and/or a report of the operations performed by the execution engine 146 .
- the data quality check result 234 may indicate whether the data quality check 226 succeeded or failed.
- the exception 236 may include instructions for specific actions to be performed, e.g., by the database management system 114 , in response to the data quality check 226 failing. The actions may include, for example, logging the failure, rolling back an operation, and/or communicating reasons for the failure to an administrator of the database 128 .
- the data quality check hint 140 is usable by the query optimizer 138 to inject a reusable and configurable data quality check 226 into the query execution plan 224 so that the query result 232 is not output by the execution engine 146 when the data quality check 226 , as defined by an associated check rule of the plurality of data quality check instructions 142 , fails.
- high data integrity can be maintained in the database 128 with reduced job execution times and computing resource usage.
- This section describes examples of procedures for a data quality check using a structured query language hint. Aspects of the procedures may be implemented in hardware, firmware, software, or a combination thereof. The procedures are shown as a set of blocks that specify operations performed by one or more devices and are not necessarily limited to the orders shown for performing the operations by the respective blocks.
- FIG. 3 is a flow diagram depicting an algorithm as a step-by-step procedure 300 in an example of implementing a data quality check using a query hint.
- the procedure 300 is performable by a database management system (e.g., the database management system 114 ), including the query optimizer 138 and the execution engine 146 . While the procedure 300 is a high-level method for executing the data quality check that is applicable to a plurality of different types of data quality checks, additional procedures for executing specific types of data quality checks are also described herein, e.g., with respect to FIGS. 4 - 8 .
- a database query statement for a database query is received by a query optimizer (block 302 ).
- the database query statement may be provided to the database management system 114 by a user (e.g., via the database management user interface 148 ) or scheduled by, for example, the one or more application domains 120 .
- the database query statement may specify desired data retrieval and/or manipulation operations to be performed during the database query, such as data movement and integration operations of ETL processes, and may be communicated in a programming language (e.g., structured query language) that is understood by the query optimizer 138 .
- the database query statement may include one or more clauses that define at least one operation to be performed on data within the database 128 . Non-limiting examples of such operations include data retrieval, data updates, data filtering, data sorting, data table joining, data aggregation, data movement, and other data-based calculations.
- a data quality check hint is identified in the database query statement by the query optimizer (block 304 ).
- the query optimizer 138 analyzes the database query statement and identities the presence of the data quality check hint 140 based on hint-specific syntax within the database query statement and a keyword indicating a name of the particular data quality check hint. That is, the query optimizer 138 recognizes the specific keyword (e.g., hint name) and/or other annotation indicating the data quality check hint 140 .
- the query optimizer 138 analyzes the database query statement and identities the presence of the data quality check hint 140 based on hint-specific syntax within the database query statement and a keyword indicating a name of the particular data quality check hint. That is, the query optimizer 138 recognizes the specific keyword (e.g., hint name) and/or other annotation indicating the data quality check hint 140 .
- the specific keyword e.g., hint name
- examples of specific data quality check hints include the record check hint 202 , the duplicate check hint 204 , the duplicate join hint 206 , the reference check hint 208 , and the scalar query hint 210 , although other data quality check hints are also possible.
- a query execution plan is generated, by the query optimizer, based on the database query statement and the data quality check hint (block 306 ).
- the query optimizer 138 analyzes the database query statement and generates the query execution plan 224 based on table sizes, available indexes, execution time, and resource usage.
- the query execution plan 224 includes a series of operators (e.g., steps) that are to be executed in a specific order to perform the data retrieval and/or manipulation operations specified in the query database statement. Each operator in the query execution plan 224 is related to executing a specific task, such as filtering data or joining tables.
- the query optimizer 138 determines the order of these operators based on an estimated computational cost and execution time and the available resources of the database 128 , for example.
- generating the query execution plan includes retrieving a data quality check instruction based on the data quality check hint and/or a context of the database query statement (block 308 ).
- the query optimizer 138 retrieves the data quality check instruction from the storage device 144 based on the keyword indicated in the data quality check hint 140 .
- the storage device 144 may store plurality of data quality check instructions 142 in a manner that enables efficient updates to the parameters 154 , e.g., via the database management user interface 148 .
- the data quality check instruction includes one or more predefined rules, constraints, and/or criteria, collectively referred to as a “condition,” that differentiates reliable data from unreliable data.
- Generating the query execution plan further includes inserting the data quality check instruction into the query execution plan so that completion of the query execution plan is dependent on satisfying a condition of the data quality check instruction (block 310 ).
- the operators are executed in order, and so the query optimizer 138 builds the query execution plan 224 such that the data retrieval and/or manipulation operations cannot be completed unless the condition of the data quality check instruction is satisfied.
- the query optimizer 138 modifies the query execution plan 224 compared to when the data quality check hint 140 is not included in the query 136 .
- the condition is specified by the data quality check being performed and evaluates whether a quality, accuracy, completeness, consistency, and/or validity of data stored in the database, such as within one or more target data tables specified in the database query statement, are as expected.
- the query execution plan including the condition of the data quality check instruction, is received by an execution engine (block 312 ).
- the execution engine 146 interprets the query execution plan 224 and carries out the operators in the order specified, such as by accessing tables and/or indexes of the database 128 to complete the data quality check 226 before completing other data retrieval and/or manipulation operations specified in the database query statement.
- Specific, non-limiting examples of the conditions and operations performed by the execution engine 146 while completing the data quality check 226 are described herein, e.g., with respect to FIGS. 2 and 4 - 8 .
- the exception and a result of the data quality check are recorded (block 318 ).
- the execution engine 146 or another component of the database management system 114 , logs the exception and the result of the data quality check in a data table, such as in one of the tables of the database 128 or in another database.
- An entry may be added to the data table that includes, for instance, a job identifier, a time stamp (e.g., a date and time), information regarding the database query statement that was attempted, information regarding which data quality check was performed (e.g., the keyword or hint name), an indication that the data quality check failed, and the exception that was output.
- Results of the database query and the data quality check are output (block 322 ).
- the results of the database query include a final output of the query execution, such as the selected data, results of the data manipulation operations, and/or a report of the operations performed by the execution engine 146 .
- an entry may be added to the data table that records the results of the data quality check, including, for example, the job identifier, the time stamp, information regarding the database query statement, information regarding the data quality check that was performed, and an indication that the data quality check succeeded.
- the record check data quality check instruction is performed by the execution engine to check for the target record of the target data table (block 404 ).
- the record check data quality check instruction may include a check projection that is configured as a constraint in the query execution plan 224 .
- execution of the check projection may either block completion of the remaining operators of the query execution plan 224 if the target record is absent in the target data table or enable completion of the remaining operators of the query execution plan 224 if the target record is present in the target data table, as will be elaborated below.
- Execution of the check projection for instance, may include scanning, by the execution engine 146 , the target data table (or a specified column in the target data table) row-by-row for the target record.
- a data retrieval and/or manipulation operation specified in the query execution plan is performed by the execution engine (block 408 ).
- the constraint is “true” (e.g., the target record is not null)
- the condition of the data quality check 226 is satisfied, and the remaining operator(s) of the query execution plan 224 are executed by the execution engine 146 , such as described above with respect to block 320 of FIG. 3 .
- a query result is recorded (block 410 ).
- the query result 232 such as the selected data, results of the data manipulation operations, and/or a report of the operations performed by the execution engine 146 , are stored in memory so that the query result 232 may be accessed by a user and/or the one or more application domains 120 that submitted the query 136 .
- a query execution plan having a duplicate check data quality check instruction is received by an execution engine of a database management system (block 502 ).
- the execution engine 146 receives the query execution plan 224 having the data quality check 226 from the query optimizer 138 , such as described above with respect to FIGS. 2 and 3 .
- the data quality check 226 corresponds to the duplicate check instruction 216 .
- the duplicate check instruction 216 includes logic for sorting records in a target data table and identifying whether duplicate records are present.
- the target data table may be identified via an operator that specifies a table name, e.g., the first data table 130 ( 1 ) or another table of the database 128 .
- a data retrieval and/or manipulation operation specified in the query execution plan is performed by the execution engine (block 508 ).
- the constraint is “true” (e.g., the adjacent records are unique)
- the condition of the data quality check 226 is satisfied, and the remaining operator(s) of the query execution plan 224 are executed by the execution engine 146 , such as described above with respect to block 320 of FIG. 3 .
- the exception is recorded (block 514 ).
- the exception 236 is recorded in a data table of the database 128 or in another database.
- the data table may be output via the database management user interface 148 and displayed via the display device 152 so that an administrator of the database management system 114 is alerted to the exception 236 .
- FIG. 6 is a flow diagram depicting an algorithm as a step-by-step procedure 600 in an example of implementing a data quality check for records duplication while executing a join operation of a database query.
- the procedure 600 is performable by a database management system (e.g., the database management system 114 ), including the query optimizer 138 and the execution engine 146 .
- the procedure 600 may performed as a part of the procedure 300 of FIG. 3 , e.g., at block 312 .
- the procedure 600 may be performed independently from the procedure 300 .
- the database management system 114 may implement a data quality check that includes identifying whether a join operation will duplicate records in the database by executing operations that differ from those described below without departing from the scope of the present disclosure.
- a query execution plan having a duplicate join data quality check instruction is received by an execution engine of a database management system (block 602 ).
- the execution engine 146 receives the query execution plan 224 having the data quality check 226 from the query optimizer 138 , such as described above with respect to FIGS. 2 and 3 .
- the data quality check 226 corresponds to the duplicate join instruction 218 .
- the duplicate join instruction 218 includes logic for sorting records in a projection of a join result of two target data tables and identifying whether duplicate records are present.
- the two target data tables may be identified via at least one operator that specifies the table names, e.g., the first data table 130 ( 1 ) and the second data table 130 ( 2 ).
- the duplicate join data quality check instruction is performed by the execution engine to determine if adjacent records are unique in a sorted window of a join result of two target data tables (block 604 ).
- the duplicate join data quality check instruction may include a join clause (e.g., a left join, a right join, or a full join), a window expression, and a unique identification constraint.
- the join clause when executed, may cause records from a first data table of the two target data tables (e.g., the first data table 130 ( 1 )) to be joined with records from a second data table of the two target data tables (e.g., the second data table 130 ( 2 )) to create a new data table, or the join result.
- the window expression when executed, may cause data in the join result to be sorted by identification name without altering the data in the join result.
- Execution of the unique identification constraint e.g., by the execution engine 146 ) may block completion of the remaining operator(s) of the query execution plan 224 if duplicate records are found or enable completion of the remaining operator(s) of the query execution plan 224 if duplicate records are not found, as will be elaborated below.
- the adjacent records are unique (block 606 ).
- the adjacent records are unique if no record has a same identification name as another record in the join result.
- the adjacent records are not unique if a first record has the same identification name as a second record in the join result.
- the constraint (e.g., condition) of the data quality check 226 includes the adjacent records in the sorted version of the join result being unique.
- a join operation of the two target data tables is performed by the execution engine (block 608 ).
- the constraint is “true” (e.g., the adjacent records are unique)
- the condition of the data quality check 226 is satisfied, and the remaining operator(s) of the query execution plan 224 are executed by the execution engine 146 , such as described above with respect to block 320 of FIG. 3 .
- this includes joining the two target data tables according to the specified join clause.
- a query result is recorded (block 610 ).
- the query result 232 e.g., the join result
- the join result is stored in memory so that the query result 232 may be accessed by a user and/or the one or more application domains 120 that submitted the query 136 .
- the join result may be stored as a new data table in the database 128 .
- an exception is output that blocks completion of the join operation (block 612 ).
- the constraint is “false” (e.g., the adjacent records are not unique)
- the condition of the data quality check 226 is not satisfied, and the remaining operator(s) of the query execution plan 224 , including recording the join result, are not executed by the execution engine 146 .
- the exception 236 may provide instructions for recording the failure of the query 136 and/or remediating the data quality issue.
- the exception is recorded (block 614 ).
- the exception 236 is recorded in a data table of the database 128 or in another database.
- the data table may be output via the database management user interface 148 and displayed via the display device 152 so that an administrator of the database management system 114 is alerted to the exception 236 .
- FIG. 7 is a flow diagram depicting an algorithm as a step-by-step procedure 700 in an example of implementing a data quality check for a shared record between two target data tables while executing a database query.
- the procedure 700 is performable by a database management system (e.g., the database management system 114 ), including the query optimizer 138 and the execution engine 146 .
- the procedure 700 may performed as a part of the procedure 300 of FIG. 3 , e.g., at block 312 .
- the procedure 700 may be performed independently from the procedure 300 .
- the database management system 114 may implement a data quality check that includes checking data between two target data tables by executing operations that differ from those described below without departing from the scope of the present disclosure.
- a query execution plan having a reference check data quality check instruction is received by an execution engine of a database management system (block 702 ).
- the execution engine 146 receives the query execution plan 224 having the data quality check 226 from the query optimizer 138 , such as described above with respect to FIGS. 2 and 3 .
- the data quality check 226 corresponds to the reference check instruction 220 .
- the reference check instruction 220 includes logic for a check projection that performs a reference check between two target data tables.
- the two target data tables may be identified via at least one operator that specifies the table names, e.g., the first data table 130 ( 1 ) and the second data table 130 ( 2 ).
- the reference check data quality check instruction is performed by the execution engine to identify whether a reference from a first data table is present in a second data table (block 704 ).
- the reference check data quality check instruction may include a join clause (e.g., a left join, a right join, or a full join) and a check projection.
- the join clause when executed, may cause records from the first data table (e.g., the first data table 130 ( 1 )) to be joined with records from the second data table (e.g., the second data table 130 ( 2 )) to create a new data table, or a join result.
- Execution of the check projection may include scanning, by the execution engine 146 , the second data table (or a specified column in the second data table) row-by-row for a specified reference from the first data table, which may be indicated by an identification name.
- the check projection may be configured as a constraint in the query execution plan 224 . As such, execution of the check projection may either block completion of the remaining operators of the query execution plan 224 if the reference is not found in the second data table or enable completion of the remaining operators of the query execution plan 224 if the reference is found in the second data table, as will be elaborated below.
- the target record is identified in the second data table if the reference is present in the second data table.
- the target record is not identified in the second data table if the reference is missing from the second data table.
- the constraint (e.g., condition) of the data quality check 226 includes the reference from the first data table being present in the second data table.
- a join operation of the two target data tables is performed by the execution engine (block 708 ).
- the constraint is “true” (e.g., the reference is found in the second data table)
- the condition of the data quality check 226 is satisfied, and the remaining operator(s) of the query execution plan 224 are executed by the execution engine 146 , such as described above with respect to block 320 of FIG. 3 .
- this includes joining the two target data tables according to the specified join clause.
- a query result is recorded (block 710 ).
- the query result 232 e.g., the join result
- the join result is stored in memory so that the query result 232 may be accessed by a user and/or the one or more application domains 120 that submitted the query 136 .
- the join result may be stored as a new data table in the database 128 .
- an exception is output that blocks completion of the join operation (block 712 ).
- the constraint is “false” (e.g., the reference is not found in the second data table)
- the condition of the data quality check 226 is not satisfied, and the remaining operator(s) of the query execution plan 224 , including recording the join result, are not executed by the execution engine 146 .
- the exception 236 may provide instructions for recording the failure of the query 136 and/or remediating the data quality issue.
- the exception is recorded (block 714 ).
- the exception 236 is recorded in a data table of the database 128 or in another database.
- the data table may be output via the database management user interface 148 and displayed via the display device 152 so that an administrator of the database management system 114 is alerted to the exception 236 .
- FIG. 8 is a flow diagram depicting an algorithm as a step-by-step procedure 800 in an example of implementing a data quality check including a scalar subquery while executing a database query.
- the procedure 800 is performable by a database management system (e.g., the database management system 114 ), including the query optimizer 138 and the execution engine 146 .
- the procedure 800 may performed as a part of the procedure 300 of FIG. 3 , e.g., at block 312 .
- the procedure 800 may be performed independently from the procedure 300 .
- the database management system 114 may implement a scalar query data quality check by executing operations that differ from those described below without departing from the scope of the present disclosure.
- a query execution plan having a scalar query data quality check instruction is received by an execution engine of a database management system (block 802 ).
- the execution engine 146 receives the query execution plan 224 having the data quality check 226 from the query optimizer 138 , such as described above with respect to FIGS. 2 and 3 .
- the data quality check 226 corresponds to the scalar query instruction 222 .
- the scalar query instruction 222 includes logic for performing any subquery or calculation that returns a single value (e.g., a scalar), such as a minimum function, a maximum function, an average function, or a data retrieval function.
- the scalar query data quality check instruction may include one or more operators for executing the specified subquery (e.g., the calculation or data retrieval that returns the single value), operators for converting the single value to the Boolean, and a constraint to check if the Boolean is “true.”
- the subquery and/or the rules for converting the single value to the Boolean are programmed via the database management user interface 148 .
- the target data table may be identified via an operator that specifies a table name, e.g., the first data table 130 ( 1 ) or another table of the database 128 .
- execution of the Boolean constraint may block completion of the remaining operator(s) of the query execution plan 224 if the Boolean is “false” or enable completion of the remaining operator(s) of the query execution plan 224 if the Boolean is “true,” as will be elaborated below.
- the Boolean is either “true” (e.g., a first value, such as one) or “false” (e.g., a second value, such as zero), and thus, the single value is determined to be “true” or “false” according to a condition of the rules.
- the rules may define “true” as a value greater than a pre-defined threshold value (e.g., zero or a non-zero value), as set via the parameters 154 .
- Boolean If the Boolean is “true,” a data retrieval and/or manipulation operation specified in the query execution plan is performed by the execution engine (block 810 ). By way of example, because the Boolean is “true,” the condition of the data quality check 226 is satisfied, and the remaining operator(s) of the query execution plan 224 are executed by the execution engine 146 , such as described above with respect to block 320 of FIG. 3 .
- the exception is recorded (block 816 ).
- the exception 236 is recorded in a data table of the database 128 or in another database.
- the data table may be output via the database management user interface 148 and displayed via the display device 152 so that an administrator of the database management system 114 is alerted to the exception 236 .
- the example computing device 902 as illustrated includes a processing system 904 , one or more computer-readable media 906 , and one or more I/O interfaces 908 that are communicatively coupled, one to another.
- the computing device 902 may further include a system bus or other data and command transfer system that couples the various components, one to another.
- a system bus can include any one or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures.
- a variety of other examples are also contemplated, such as control and data lines.
- the processing system 904 is representative of functionality to perform one or more operations using hardware. Accordingly, the processing system 904 is illustrated as including hardware elements 910 that may be configured as processors, functional blocks, and so forth. This may include implementation in hardware as an application specific integrated circuit or other logic device formed using one or more semiconductors.
- the hardware elements 910 are not limited by the materials from which they are formed or the processing mechanisms employed therein.
- processors may be comprised of semiconductor(s) and/or transistors (e.g., electronic integrated circuits (ICs)).
- processor-executable instructions may be electronically executable instructions.
- the computer-readable media 906 is illustrated as including memory/storage 912 .
- the memory/storage 912 represents memory/storage capacity associated with one or more computer-readable media.
- the memory/storage 912 may include volatile media (such as random-access memory (RAM)) and/or nonvolatile media (such as read only memory (ROM), Flash memory, optical disks, magnetic disks, and so forth).
- RAM random-access memory
- ROM read only memory
- Flash memory optical disks
- magnetic disks magnetic disks, and so forth
- the memory/storage 912 may include fixed media (e.g., RAM, ROM, a fixed hard drive, and so on) as well as removable media (e.g., Flash memory, a removable hard drive, an optical disc, and so forth).
- the computer-readable media 906 may be configured in a variety of other ways as further described below.
- Input/output interface(s) 908 are representative of functionality to allow a user to enter commands and information to computing device 902 , and also allow information to be presented to the user and/or other components or devices using various input/output devices.
- input devices include a keyboard, a cursor control device (e.g., a mouse), a microphone, a scanner, touch functionality (e.g., capacitive or other sensors that are configured to detect physical touch), a camera (e.g., which may employ visible or non-visible wavelengths such as infrared frequencies to recognize movement as gestures that do not involve touch), and so forth.
- Examples of output devices include a display device (e.g., a monitor or projector), speakers, a printer, a network card, tactile-response device, and so forth.
- the computing device 902 may be configured in a variety of ways as further described below to support user interaction.
- Computer-readable media may include a variety of media that may be accessed by the computing device 902 .
- computer-readable media may include “computer-readable storage media” and “computer-readable signal media.”
- Computer-readable signal media may refer to a signal-bearing medium that is configured to transmit instructions to the hardware of the computing device 902 , such as via a network.
- Signal media typically may embody computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as carrier waves, data signals, or other transport mechanism.
- Signal media also include any information delivery media.
- modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
- communication media include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media.
- software, hardware, or executable modules may be implemented as one or more instructions and/or logic embodied on some form of computer-readable storage media and/or by one or more hardware elements 910 .
- the computing device 902 may be configured to implement particular instructions and/or functions corresponding to the software and/or hardware modules. Accordingly, implementation of a module that is executable by the computing device 902 as software may be achieved at least partially in hardware, e.g., through use of computer-readable storage media and/or hardware elements 910 of the processing system 904 .
- the instructions and/or functions may be executable/operable by one or more articles of manufacture (for example, one or more computing devices 902 and/or processing systems 904 ) to implement techniques, modules, and examples described herein.
- the techniques described herein may be supported by various configurations of the computing device 902 and are not limited to the specific examples of the techniques described herein. This functionality may also be implemented all or in part through use of a distributed system, such as over a “cloud” 914 via a platform 916 as described below.
- the platform 916 may abstract resources and functions to connect the computing device 902 with other computing devices.
- the platform 916 may also serve to abstract scaling of resources to provide a corresponding level of scale to encountered demand for the resources 918 that are implemented via the platform 916 .
- implementation of functionality described herein may be distributed throughout the system 900 .
- the functionality may be implemented in part on the computing device 902 as well as via the platform 916 that abstracts the functionality of the cloud 914 .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (17)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/362,882 US12475088B2 (en) | 2023-07-31 | 2023-07-31 | Data quality check using a structured query language hint |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/362,882 US12475088B2 (en) | 2023-07-31 | 2023-07-31 | Data quality check using a structured query language hint |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20250045257A1 US20250045257A1 (en) | 2025-02-06 |
| US12475088B2 true US12475088B2 (en) | 2025-11-18 |
Family
ID=94387236
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/362,882 Active 2043-08-25 US12475088B2 (en) | 2023-07-31 | 2023-07-31 | Data quality check using a structured query language hint |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US12475088B2 (en) |
Citations (19)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080091647A1 (en) | 2006-10-11 | 2008-04-17 | International Business Machines Corporation | Tool and a method for customizing hint |
| US20080120283A1 (en) * | 2006-11-17 | 2008-05-22 | Oracle International Corporation | Processing XML data stream(s) using continuous queries in a data stream management system |
| US20080133458A1 (en) * | 2006-12-01 | 2008-06-05 | Microsoft Corporation | Statistics adjustment to improve query execution plans |
| US7739263B2 (en) | 2003-09-06 | 2010-06-15 | Oracle International Corporation | Global hints |
| US20100250518A1 (en) * | 2009-03-28 | 2010-09-30 | Microsoft Corporation | Flexible query hints in a relational database |
| CN102222088A (en) | 2011-05-30 | 2011-10-19 | 大连银行股份有限公司 | System and method for checking, summarizing and displaying data quality according to multidimensional attribute |
| US20120191698A1 (en) * | 2011-01-20 | 2012-07-26 | Accenture Global Services Limited | Query plan enhancement |
| US20140089294A1 (en) * | 2012-09-26 | 2014-03-27 | Microsoft Corporation | Sql generation for assert, update and delete relational trees |
| US9965510B2 (en) | 2015-06-22 | 2018-05-08 | International Business Machines Corporation | Partition access method for query optimization |
| US10592506B1 (en) | 2013-02-13 | 2020-03-17 | Amazon Technologies, Inc. | Query hint specification |
| US10810202B2 (en) | 2018-06-14 | 2020-10-20 | Microsoft Technology Licensing, Llc | Execution plan stitching |
| US20210026855A1 (en) * | 2019-07-23 | 2021-01-28 | Oracle International Corporation | Influencing plan generation in the context of the two phase query optimization approach |
| CN112286912A (en) | 2020-08-12 | 2021-01-29 | 上海柯林布瑞信息技术有限公司 | Medical data quality checking method and device, terminal and storage medium |
| CN112396343A (en) | 2020-11-30 | 2021-02-23 | 北京中电普华信息技术有限公司 | Data quality checking method and device |
| US11100104B2 (en) | 2019-04-09 | 2021-08-24 | Accenture Global Solutions Limited | Query tuning utilizing optimizer hints |
| CN115495276A (en) | 2022-11-18 | 2022-12-20 | 北京奥星贝斯科技有限公司 | Abnormity detection method, device, equipment and readable storage medium |
| CN115544072A (en) | 2022-10-10 | 2022-12-30 | 浪潮软件集团有限公司 | A method and system for automatically correcting unhealthy database statements |
| US20230117461A1 (en) * | 2021-10-15 | 2023-04-20 | Oracle International Corporation | Method and system to implement error handling with fine-grained sql hints |
| US20250021554A1 (en) * | 2023-07-12 | 2025-01-16 | Sap Se | Patterned query statements with hints |
-
2023
- 2023-07-31 US US18/362,882 patent/US12475088B2/en active Active
Patent Citations (19)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7739263B2 (en) | 2003-09-06 | 2010-06-15 | Oracle International Corporation | Global hints |
| US20080091647A1 (en) | 2006-10-11 | 2008-04-17 | International Business Machines Corporation | Tool and a method for customizing hint |
| US20080120283A1 (en) * | 2006-11-17 | 2008-05-22 | Oracle International Corporation | Processing XML data stream(s) using continuous queries in a data stream management system |
| US20080133458A1 (en) * | 2006-12-01 | 2008-06-05 | Microsoft Corporation | Statistics adjustment to improve query execution plans |
| US20100250518A1 (en) * | 2009-03-28 | 2010-09-30 | Microsoft Corporation | Flexible query hints in a relational database |
| US20120191698A1 (en) * | 2011-01-20 | 2012-07-26 | Accenture Global Services Limited | Query plan enhancement |
| CN102222088A (en) | 2011-05-30 | 2011-10-19 | 大连银行股份有限公司 | System and method for checking, summarizing and displaying data quality according to multidimensional attribute |
| US20140089294A1 (en) * | 2012-09-26 | 2014-03-27 | Microsoft Corporation | Sql generation for assert, update and delete relational trees |
| US10592506B1 (en) | 2013-02-13 | 2020-03-17 | Amazon Technologies, Inc. | Query hint specification |
| US9965510B2 (en) | 2015-06-22 | 2018-05-08 | International Business Machines Corporation | Partition access method for query optimization |
| US10810202B2 (en) | 2018-06-14 | 2020-10-20 | Microsoft Technology Licensing, Llc | Execution plan stitching |
| US11100104B2 (en) | 2019-04-09 | 2021-08-24 | Accenture Global Solutions Limited | Query tuning utilizing optimizer hints |
| US20210026855A1 (en) * | 2019-07-23 | 2021-01-28 | Oracle International Corporation | Influencing plan generation in the context of the two phase query optimization approach |
| CN112286912A (en) | 2020-08-12 | 2021-01-29 | 上海柯林布瑞信息技术有限公司 | Medical data quality checking method and device, terminal and storage medium |
| CN112396343A (en) | 2020-11-30 | 2021-02-23 | 北京中电普华信息技术有限公司 | Data quality checking method and device |
| US20230117461A1 (en) * | 2021-10-15 | 2023-04-20 | Oracle International Corporation | Method and system to implement error handling with fine-grained sql hints |
| CN115544072A (en) | 2022-10-10 | 2022-12-30 | 浪潮软件集团有限公司 | A method and system for automatically correcting unhealthy database statements |
| CN115495276A (en) | 2022-11-18 | 2022-12-20 | 北京奥星贝斯科技有限公司 | Abnormity detection method, device, equipment and readable storage medium |
| US20250021554A1 (en) * | 2023-07-12 | 2025-01-16 | Sap Se | Patterned query statements with hints |
Non-Patent Citations (6)
| Title |
|---|
| Bruno, Nicolas , et al., "Power Hints for Query Optimization", 25th International Conference on Data Engineering [retrieved May 24, 2023]. Retrieved from the Internet <https://web.archive.org/web/20180723171018id_/https://www.microsoft.com/en-us/research/wp-content/uploads/2009/01/bruno-powerhints-research-long.pdf>., Apr. 2, 2009, 12 Pages. |
| Filho, Edson Ramiro Lucas , et al., "Don't Tune Twice: Reusing Tuning Setups for SQL-on-Hadoop Queries", International Conference on Conceptual Modeling [retrieved May 24, 2023]. Retrieved from the Internet <https://www.inf.ufpr.br/eduardo/papers/Ramiro_ER2019.pdf>., Nov. 4, 2019, 14 Pages. |
| Hemens, Lewis , "Advanced data quality testing with SQL and Dataform", Tada science, Inc., Blog [retrieved May 24, 2023]. Retrieved from the Internet <https://dataform.co/blog/advanced-data-quality-testing>., Mar. 10, 2020, 12 Pages. |
| Bruno, Nicolas , et al., "Power Hints for Query Optimization", 25th International Conference on Data Engineering [retrieved May 24, 2023]. Retrieved from the Internet <https://web.archive.org/web/20180723171018id_/https://www.microsoft.com/en-us/research/wp-content/uploads/2009/01/bruno-powerhints-research-long.pdf>., Apr. 2, 2009, 12 Pages. |
| Filho, Edson Ramiro Lucas , et al., "Don't Tune Twice: Reusing Tuning Setups for SQL-on-Hadoop Queries", International Conference on Conceptual Modeling [retrieved May 24, 2023]. Retrieved from the Internet <https://www.inf.ufpr.br/eduardo/papers/Ramiro_ER2019.pdf>., Nov. 4, 2019, 14 Pages. |
| Hemens, Lewis , "Advanced data quality testing with SQL and Dataform", Tada science, Inc., Blog [retrieved May 24, 2023]. Retrieved from the Internet <https://dataform.co/blog/advanced-data-quality-testing>., Mar. 10, 2020, 12 Pages. |
Also Published As
| Publication number | Publication date |
|---|---|
| US20250045257A1 (en) | 2025-02-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12235827B2 (en) | Applications of automated discovery of template patterns based on received requests | |
| US20220019576A1 (en) | Using stored execution plans for efficient execution of natural language questions | |
| US7844570B2 (en) | Database generation systems and methods | |
| US9747335B2 (en) | Generic operator framework | |
| US10789295B2 (en) | Pattern-based searching of log-based representations of graph databases | |
| US11907198B2 (en) | Selective recommendation and deployment of extensions in low-code approach | |
| US20180046562A1 (en) | Identification of code features potentially associated with code behavior | |
| US10956400B2 (en) | Query processing using primary data versioning and secondary data | |
| US11487742B2 (en) | Consistency checks between database systems | |
| US20180150519A1 (en) | Extreme Value Estimation for Query Optimization in Analytical Databases | |
| US20180137161A1 (en) | Data analysis schema and method of use | |
| US20240256426A1 (en) | Runtime error attribution for database queries specified using a declarative database query language | |
| EP3486798A1 (en) | Reporting and data governance management | |
| US20210397745A1 (en) | Data providing server device and data providing method | |
| US11119761B2 (en) | Identifying implicit dependencies between code artifacts | |
| Bidoit et al. | EFQ | |
| US10417234B2 (en) | Data flow modeling and execution | |
| JP2016133946A (en) | Source code reviewing method and system therefor | |
| US11281569B2 (en) | Self-curative computer process automates | |
| US12475088B2 (en) | Data quality check using a structured query language hint | |
| US11106526B1 (en) | Architecture-based root cause analysis | |
| Nevarez | SQL Server Query Tuning and Optimization: Optimize Microsoft SQL Server 2022 Queries and Applications | |
| US20250258654A1 (en) | Retrieval augmented generation based on process artifacts | |
| US10318524B2 (en) | Reporting and data governance management | |
| US10769164B2 (en) | Simplified access for core business with enterprise search |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| AS | Assignment |
Owner name: EBAY INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WU, SHAOJIE;ZHU, LIMING;ZHU, GENG;SIGNING DATES FROM 20230731 TO 20230802;REEL/FRAME:064466/0872 Owner name: EBAY INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNOR'S INTEREST;ASSIGNORS:WU, SHAOJIE;ZHU, LIMING;ZHU, GENG;SIGNING DATES FROM 20230731 TO 20230802;REEL/FRAME:064466/0872 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ALLOWED -- NOTICE OF ALLOWANCE NOT YET MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |