[go: up one dir, main page]

HK1176710A1 - Method and device for an extraction- transformation-loading test - Google Patents

Method and device for an extraction- transformation-loading test

Info

Publication number
HK1176710A1
HK1176710A1 HK13103671.5A HK13103671A HK1176710A1 HK 1176710 A1 HK1176710 A1 HK 1176710A1 HK 13103671 A HK13103671 A HK 13103671A HK 1176710 A1 HK1176710 A1 HK 1176710A1
Authority
HK
Hong Kong
Prior art keywords
test
objects
etl
sub
testing
Prior art date
Application number
HK13103671.5A
Other languages
Chinese (zh)
Other versions
HK1176710B (en
Inventor
孟祥敏
Original Assignee
阿里巴巴集團控股有限公司 號郵箱
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集團控股有限公司 號郵箱, 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集團控股有限公司 號郵箱
Publication of HK1176710A1 publication Critical patent/HK1176710A1/en
Publication of HK1176710B publication Critical patent/HK1176710B/en

Links

Landscapes

  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for ETL (extract-transform-load) tests. The method includes: dividing ETL test objects into test subobjects according to preset rules; and testing the test subobjects one by one. Complexity of ETL test objects can be reduced, and test efficiency can be improved.

Description

ETL test method and device
Technical Field
The application relates to the technical field of data warehouse analysis, in particular to an ETL testing method and device.
Background
The data warehouse is a stand-alone data environment that requires data to be imported into the data warehouse from an online transaction environment, external data sources, and offline data storage media through an extraction process. The method aims to establish a structured data storage space, separate data of different data sources, form a uniform and effective data set, and finally process and integrate the data into required data.
ETL is a process of data extraction (Extract), Cleaning (Cleaning), transformation (Transform), and loading (Load). The method is an important ring for constructing a data warehouse, and a user extracts required data from a data source, and finally loads the data into the data warehouse according to a predefined data warehouse model after data cleaning. In the testing process of ETL, the test object is usually formed by multiple task schedules, or by a single task schedule, but usually one or more hundred lines of level code are contained in a single task.
With the prior art, the testing process of ETL can generally include the following steps:
step 1, analyzing data matching (mapping) of source data and target data:
analyzing the dependent source table, the dependent target table and the mutual dependency relationship;
step 2, designing a test case:
the storage process is considered as a whole, input and output are concerned, and a test case is designed according to the service function;
step 3, data preparation:
preparing data of each dependent source table;
step 4, result verification is carried out:
and (5) comparing the source table with the target table to verify the result.
In the test process of the ETL, a test object is composed of a plurality of task schedules, or a single task is composed of one or more codes at the hundred row level or the thousand row level, the data volume is huge, and a plurality of source tables are correlated. By adopting the prior art, the test object is used as a whole for testing, so that the test process is very complex, the task amount is huge, one small detail is problematic, the whole test result can make mistakes, and the test needs to be carried out again, so that the test efficiency of the whole ETL is very low. Therefore, one technical problem that needs to be urgently solved by those skilled in the art is: an ETL test mechanism is provided to reduce the complexity of an ETL test object and improve the test efficiency.
Disclosure of Invention
The technical problem to be solved by the application is to provide an ETL test method, based on the application, the complexity of an ETL test object can be reduced, and the test efficiency is improved.
Correspondingly, the application also provides an ETL testing device for ensuring the realization and the application of the method in practice.
In order to solve the above problem, the present application discloses an ETL testing method, including:
splitting an ETL test object into test sub-objects according to a preset rule;
and testing the test sub-objects one by one.
Preferably, the step of splitting the ETL test object into test sub-objects according to preset rules includes:
acquiring the number of detachable test sub-objects in the ETL test object;
and splitting the ETL test object into the test sub-objects with the corresponding number according to the number of the detachable test sub-objects.
Preferably, the step of obtaining the number of detachable test sub-objects in the ETL test object further comprises:
dividing each service contained in the ETL test object to obtain the number of detachable test sub-objects;
alternatively, the first and second electrodes may be,
and (4) dividing each function contained in the ETL test object to obtain the number of the detachable test sub-objects.
Preferably, the step of obtaining the number of detachable test sub-objects in the ETL test object includes:
acquiring information of a temporary table and a target table of a storage process in the ETL test object;
and calculating the sum of the number of the temporary table and one target table to serve as the number of the detachable test sub-objects.
Preferably, the step of splitting the ETL test object into test sub-objects according to the number of the test sub-objects that can be split includes:
extracting code segments inserted into the same temporary table from codes corresponding to the ETL test objects;
and combining the code sections inserted into the same temporary table to form a test child object.
Preferably, the code corresponding to the ETL test object is a code segment matching from Insert to Select in the code of the ETL test object, and the test sub-object is an SQL script composed of the Insert to Select code segments.
Preferably, the step of testing the test sub-objects one by one includes:
preparing a test case and test data for the test sub-object;
and executing the test case adopting the test data to obtain a test result.
Preferably, the step of preparing the test case and the test data for the test child object includes:
compiling a test case according to a test purpose;
and acquiring the source table information depended by the current test child object.
Preferably, the method further comprises:
and packaging the test sub-objects, and executing the test cases corresponding to the test sub-objects in batch to obtain the test result of the ETL test object.
The present application further provides an ETL testing apparatus, comprising:
the splitting module is used for splitting the ETL test object into test sub-objects according to a preset rule;
and the testing module is used for testing the testing sub-objects one by one.
Compared with the prior art, the method has the following advantages:
compared with the prior art, the ETL test object is split into the test sub-objects according to the preset rule, and then the test sub-objects are tested one by one, so that the complex test object is simplified, the test complexity is reduced, and the test efficiency is improved.
The test is carried out one by one aiming at the test sub-objects, so that the range of single test is reduced, the test range is accurately positioned, and the mutual independence is realized. Each test sub-object has independence, and the test can be carried out independently, and piling can be carried out at the test sub-object dependent point. As the single test sub-object has small test range, is clear and simple, once a problem occurs, the problem can be quickly positioned.
Meanwhile, when test data are prepared, the table data at the most source can be avoided, simple temporary table data construction is carried out according to the service, and the cost of data construction is reduced.
Drawings
FIG. 1 is a flow chart of an embodiment 1 of an ETL testing method of the present application;
FIG. 2 is a flow chart of an ETL testing method embodiment 2 of the present application;
FIG. 3 is a block diagram of an embodiment 1 of an ETL testing device of the present application;
fig. 4 is a block diagram of an ETL testing apparatus in embodiment 2 of the present application.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.
In order to make the present application better understood by those skilled in the art, the basic flow of ETL test is described below in the general project case. The ETL test generally includes the following steps: the method comprises the steps of requirement analysis, test analysis, standard data construction, a test method, test case design, test result verification and release.
(1) Demand analysis
Familiarity with the business process and business rules is required in this segment.
(2) Test analysis
This process requires the determination of individual test points. The ETL test mainly includes an ETL general check and a business logic check.
The routine check includes:
1. whether the ETL script has a running error or not, and running the script (see an execution plan);
2. whether the error handling mechanism of the ETL script is complete (code review);
3. whether the ETL script supports rollback.
The business logic check comprises:
1. and (5) checking the data volume. Checking whether the record number is consistent with the expectation;
2. uniqueness check, mainly to see if the primary key is repeated (cookie _ id, member _ id is repeated);
3. checking the correctness of the conversion of the service field, checking whether the index calculation is correct, respectively taking a certain number of records from the source table and the target table through sampling check, judging whether the field mapping is correct (mapping field), and judging whether the index calculation is correct (index calculation field);
4. and (4) randomness verification (randomly taking several pieces of data to see whether random codes exist or not, abnormal data and the like).
In the test analysis, the test emphasis and the test range of the ETL test are also required to be analyzed, the complex logic part is taken as the test emphasis aiming at the key service in the project, and the use case is designed according to each index aiming at the test object. Meanwhile, the source table is also needed to be analyzed, the association relation between the source tables is cleared, the target table is analyzed, the mapping relation between the source tables and the target table and between the source tables and the target table are analyzed according to requirements, and the data flow graph of the service is analyzed.
(3) Test method
The test method is based on query test, the expected result is displayed based on sql, data change is achieved, and the result is unchanged. In addition, regression is facilitated, and the test strategy adopts incremental testing, namely step-by-step submission testing. This process requires a large number of temporary tables.
(4) Standard data set construction
The standard data set construction is divided into two aspects, one is to directly extract data on a line, and the other is to use script to make exception data.
The data on the line is extracted using a database link (dblink). When extracting the online data, attention needs to be paid to test the comprehensiveness of the data. I.e. full coverage of test data. For example, the gender (sex) field, when extracting the online data, it is necessary to extract the male (male) and female (female) condition, but not just the male or female, so that the test data is lost. When data extraction is performed on the associated table, the main table can be extracted first, and then the sub-table can be conditionally extracted according to the data of the main table.
Creating anomaly data, which may be considered from several aspects: field type, field length, null value, traffic outlier, unique constraint value.
(5) Test case design
The test cases can be designed independently or by adopting a scheduling idea, and when the test cases are designed by adopting a scheduling method, a plurality of cases can be verified at one time, and in addition, the regression is convenient.
For example, the summary script is:
the test case 1 is:
(6) and verifying the test result
The test result verification comprises two steps:
the first step is to verify whether the record numbers are consistent, if the record numbers are inconsistent, problems are certainly existed, the problems are checked, and the reasons are found out. The second step checks whether the values are correct in the case where the number of records is consistent.
Two methods can be specifically adopted for realization:
the method comprises the following steps: by minus
SELECT*FROM target_a
MINUS
SELECT*FROM test_map;
It is noted that the positions of the 2 tables must be transposed here for comparison. And when the Minus function is compared, the first table is taken as a standard, and the place in the first table, which is inconsistent with the second table, is found. Namely: finding the result of the sumdt0 table inconsistent with the map table.
The method 2 comprises the following steps: the script is written for verification.
The above is a general process adopted in ETL testing in the prior art, and the following problems exist in the process:
1. the risk of branch omission exists, in the above test process, one or more storage processes are regarded as a black box, the internal implementation logic cannot be concerned, and the branch omission may exist. For example, if else is judged conditionally in the storage process, it is not known that the internal implementation cannot completely cover all branches.
2. The data preparation is complex, the ETL storage process generally involves one to n source tables, when standard data set construction is performed, data of the n tables needs to be prepared at one time, the data of the n tables may be used only in the last steps of the storage process, the process is long, if a main foreign key relationship exists between the source tables, the complexity of data preparation is further increased, and errors are prone to occur.
3. The problem is difficult to be solved, the test object is considered as a whole, once Bug appears, the problem code is difficult to be positioned at the first time, breakpoint debugging is needed, the code is required to be solved one line by one line, and if the number of code lines is large, the time is consumed for solving the problem.
4. The test cannot be performed aiming at the test key point, some key items and complex logic parts in the ETL test are required to be used as the test key point, for example, the number of records of top100 is taken, the sequencing is the test key point, the top100 sequencing of rownum is performed in the last step in the storage process, and data needs to be manufactured from the source by adopting the existing test method.
5. The Bug verification range is enlarged, the prior art is adopted for testing, testers cannot know which codes are specifically modified after Bug development and repair, and the testing range is enlarged because the testing is integrally carried out, the testing cannot be carried out only aiming at the modified part, but regressions are needed.
In view of the above problems, the applicant of the present invention has creatively proposed a core concept of the present application, which is to split an ETL test object into test sub-objects according to a preset rule, and then test the test sub-objects one by one, thereby reducing the complexity of the ETL test object and improving the test efficiency.
Referring to fig. 1, a flowchart of embodiment 1 of the ETL testing method of the present application is shown, which may specifically include the following steps:
step 101, splitting an ETL test object into test sub-objects according to preset rules.
The ETL test object is divided into the test sub-objects according to the preset rule, so that the complex test object is simplified, and the test complexity is reduced. ETL test subjects are as follows:
in a preferred embodiment of the present application, the step 101 may include:
and a substep S11 of obtaining the number of the test sub-objects which can be split in the ETL test object. The step is to design the number of the detachable test sub-objects by analyzing the test objects.
In a preferred embodiment of the present application, the sub-step S11 may include:
substep S11-1, segmenting each service contained in the ETL test object to obtain the number of detachable test sub-objects;
alternatively, the first and second electrodes may be,
and a substep S11-2 of dividing each function contained in the ETL test object to obtain the number of detachable test sub-objects.
The ETL test object is used for completing the processes of data extraction, conversion processing and loading, and comprises a plurality of modules capable of realizing the business or the function of the ETL test object, the number of the test sub-objects is critical to design, the more the test sub-objects are, the better the test sub-objects are, the function segmentation can be carried out on the test object in the specific implementation, and each segmented module is called as a test sub-object; or service segmentation is carried out aiming at the test object, and each segmented module is called as a test sub-object. Each test sub-object correspondingly realizes a certain function or service.
For example, task a is composed of 3 subtasks, each subtask correspondingly completes a function, and at this time, the test of task a can be split into 3 test subtasks.
And a substep S12, splitting the ETL test object into a corresponding number of test sub-objects according to the number of the split test sub-objects.
In a preferred embodiment of the present application, the sub-step S12 may include:
substep S12-1, extracting code segments inserted with the same temporary table from the code corresponding to the ETL test object;
and a substep S12-2, merging the plurality of code fragments inserted into the same temporary table to form a test child object.
In a specific implementation, the code corresponding to the ETL test object may be a code segment matching Insert to Select in the code of the ETL test object, and the test sub-object may be an SQL script composed of Insert to Select code segments.
As an example in step 101, the test objects are divided according to services or functions, and may be divided into three test sub-objects, and the code segments matching Insert to Select are used as the codes corresponding to the ETL test objects, and in the corresponding codes, there are two different temporary tables: the test object comprises a temp0, a temp1 and a target table01, wherein code sections corresponding to each temporary table are extracted and then combined to be called a test sub-object, specifically, the code sections of a plurality of insert temp0 tables in the code are combined to be called a test sub-object 1, the code sections of a plurality of insert temp1 tables in the code are combined to be called a test sub-object 2, the target tables are inserted into the rest code sections, and the code sections are combined to be called a test sub-object 3. Each test sub-object appears as an SQL script composed of insert to select code segments, as follows.
Test child object 1:
insert into table_temp0()
select a.*from table01a;
test child object 2:
insert into table_temp1()
select a.*from table_temp0a;
test child object 3:
insert into table_target01
select a.*from table_temp1;
in practical application, if the ETL test object is complex and each test sub-object is still huge after being split once, the test sub-objects can be split once or more times, and the application is not limited herein. For example, the ETL test object may be split into test sub-object 1, test sub-object 2, and test sub-object 3, and then test sub-object 1 may be split into test sub-object 1.1, test sub-object 1.2, and test sub-object 1.3. When splitting, it should be noted that when there is a depth dependency between some test sub-objects, in this case, the test sub-objects need to be tested as a whole.
And 102, testing the test sub-objects one by one.
In a preferred embodiment of the present application, step 102 may comprise:
and a substep S31 of preparing test cases and test data for the test sub-objects. The test cases and test data are used to complete the testing of the test sub-objects.
In a preferred embodiment of the present application, the sub-step S31 may include:
and a substep S31-1 of writing test cases according to the test purpose. The test purpose mainly comprises routine inspection and business logic inspection. For example, uniqueness checks, service field conversion correctness checks, data volume checks, etc. In a specific implementation, the test case may be written by a UE tool or a notepad, and the test case is presented in the form of sql.
For example: design test case of the test sub-object 1:
test case 1: uniqueness checking
Select count(0)from table_temp0 a group by a.id having count(0)>1;
Test case 2: index checking
Insert into result
select id,sum(a.gmv)as gmv from table01 a group by a.id
And a substep S31-2, obtaining source table information on which the current test child object depends. In a specific implementation, data preparation can be directly performed through the sql statement. Such as insert inter stable (a, b) values (a, b); alternatively, a stored procedure or java program may be written to implement, such as: the write cycle realizes the Cartesian set, assignment and the like of some fields, generates SQL statements to realize data preparation, and the prepared data is only test data for a single test sub-object, so the structure is simple.
For example, the source table that analyzed that test child object 1 depends on is table01, so data preparation is for table01 data preparation script:
Insert into table01(id,gmv)values(1,200);
Insert into table01(id,gmv)values(2,300);
and a substep S32, executing the test case adopting the test data to obtain a test result. And substituting the test data into the test case, executing the test case, and checking the operation result of the test case, so that the test of the corresponding test sub-object is completed.
In a specific implementation, the following two methods can be used to check the operation result of the test case:
1. and (6) directly comparing. The value of each field of the source table a and the target table b is checked whether to be equal through the minus function of the database. Such as: select col1, col2 from a minus Select col1, col2 from b.
2. Writing a storage process or a java program for implementation. Such as: and (4) comparing the record numbers of the source table a and the target table b in a for loop field by field.
As above, the test case 1 is executed in an sql environment:
SQL>Select count(0)from table_temp0 a group by a.id having count(0)>1;
checking the operation result: if count (0) < ═ 0, then the test case passes.
Next, test case 2 is executed in the sql environment:
SQL>Insert into result
select id,sum(a.gmv)as gmv from table01 a group by a.id
checking the operation result:
select a.id,a.gmv from result a
minus
select b.id,b.gmv from table_temp0 b
therefore, the test result of the test sub-object is obtained, and the test result of the ETL test object can be obtained by detecting all the test sub-objects. Each test object is independent, once a problem occurs, the test sub-object on which the problem is located is found, and then the single test sub-object with the problem is retested; if the key functional module is to be tested, the corresponding test sub-object can be extracted, and then the test is performed on the test sub-object.
Referring to fig. 2, a flowchart of embodiment 2 of the ETL testing method of the present application is shown, which may specifically include the following steps:
step 201, splitting an ETL test object into test sub-objects according to a preset rule;
in a preferred embodiment of the present application, step 201 may include:
substep S41, acquiring the number of detachable test sub-objects in the ETL test object;
and a substep S42, splitting the ETL test object into a corresponding number of test sub-objects according to the number of the split test sub-objects.
In a preferred embodiment of the present application, the sub-step S41 may include:
substep S41-1, obtaining temporary table and target table information of the storage process in the ETL test object;
ETL test object as example above:
wherein, including two temporary tables: temp0 and temp1, a target table target 01.
And a substep S41-2 of calculating the sum of the number of the temporary table and one target table as the number of detachable test sub-objects.
As in the above example, if the number of temp tables in the test object is 2, the number of temporary tables plus 1 target table is 3, and therefore the number of detachable test sub-objects is 3.
Step 202, testing the test sub-objects one by one.
Step 203, packaging the test sub-objects, and executing the test cases corresponding to the test sub-objects in batch to obtain the test results of the ETL test objects.
The test sub-object can be subjected to integral test after being tested, the integral test is also called as an integrated test, the test sub-object generated according to the splitting method is packaged into an sql file, and test cases formed by sql are executed in batch in a database environment to complete the integral test.
In summary, the present application provides an ETL testing method, which splits an ETL test object into test sub-objects according to preset rules, and then tests the test sub-objects one by one, so that the complex test objects are simplified, the test complexity is reduced, and the test efficiency is improved.
The test is carried out on the test sub-objects one by one, so that the range of single test can be reduced, the test range can be accurately positioned, and the test sub-objects have mutual independence in the test process. And each test sub-object has independence, can test alone, can pile at test sub-object dependent point. The single test sub-object has a small test range, is clear and simple, and can quickly locate the problem once the problem occurs.
Meanwhile, when test data are prepared, the table data at the most source can be avoided, simple temporary table data construction is carried out according to the service, and the cost of data construction is reduced.
The ETL test field is a relatively new field, and the test methods aiming at the field are fewer. The inventor of the application carries out the research of the test method aiming at the object in the ETL test field, divides the tested object for the first time, and tests by using the idea of integrally and dividing the sub-object. The method is very suitable for the characteristics of the objects in the ETL field, namely, a plurality of related source tables are provided, and the source tables are mutually associated; the data volume is large, and the data volume is ten million or even T level; the code line level is hundreds, thousands, etc. The application provides a new solution for the testing idea, the testing mode and the testing concept.
It is noted that, for simplicity of explanation, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will appreciate that the present application is not limited by the order of acts, as some steps may, in accordance with the present application, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
Referring to fig. 3, a block diagram of a structure of an embodiment 1 of the ETL testing apparatus of the present application is shown, which may specifically include the following modules:
the splitting module 301 is configured to split the ETL test object into test sub-objects according to preset rules;
a testing module 302, configured to perform testing on the testing sub-objects one by one.
In a preferred embodiment of the present application, the splitting module 301 may include:
the number obtaining submodule is used for obtaining the number of the test sub-objects which can be split in the ETL test objects;
and the test object splitting sub-module is used for splitting the ETL test object into the test sub-objects with the corresponding number according to the number of the split test sub-objects.
In a preferred embodiment of the present application, the number obtaining sub-module may further include: the business segmentation submodule is used for segmenting each business contained in the ETL test object to obtain the number of the detachable test sub-objects;
alternatively, the first and second electrodes may be,
and the function segmentation submodule is used for segmenting each function contained in the ETL test object to obtain the number of the detachable test sub-objects.
In a preferred embodiment of the present application, the test object splitting sub-module may include:
the code segment extraction submodule is used for extracting the code segments inserted into the same temporary table from the codes corresponding to the ETL test objects;
and the code segment merging submodule is used for merging the plurality of code segments inserted into the same temporary table to form a test subobject.
In a specific implementation, the code corresponding to the ETL test object is a code segment matching Insert to Select in the code of the ETL test object, and the test sub-object is an SQL script composed of Insert to Select code segments.
In a preferred embodiment of the present application, the test module 302 may include:
the test preparation submodule is used for preparing a test case and test data aiming at the test subobject;
and the test case execution submodule is used for executing the test case adopting the test data to obtain a test result.
In a specific implementation, the test preparation sub-module may include:
the test case compiling submodule is used for compiling the test case according to the test purpose;
and the source table information acquisition submodule is used for acquiring the source table information depended by the current test child object.
Referring to fig. 4, a block diagram of a structure of an embodiment 2 of the ETL testing apparatus of the present application is shown, which may specifically include the following modules:
a splitting module 401, configured to split the ETL test object into test sub-objects according to a preset rule;
a testing module 402, configured to perform testing on the testing sub-objects one by one;
a test result obtaining module 403, configured to encapsulate the test sub-objects, and execute the test cases corresponding to the test sub-objects in batch to obtain the test result of the ETL test object.
In a preferred embodiment of the present application, the splitting module 401 may include:
the number obtaining submodule is used for obtaining the number of the test sub-objects which can be split in the ETL test objects;
and the test object splitting sub-module is used for splitting the ETL test object into the test sub-objects with the corresponding number according to the number of the split test sub-objects.
In a specific implementation, the number obtaining sub-module may include:
the information acquisition submodule is used for acquiring the information of a temporary table and a target table in the storage process of the ETL test object;
and the number calculation submodule is used for calculating the sum of the number of the temporary table and the number of a target table as the number of the detachable test sub-objects.
Since the embodiment of the apparatus basically corresponds to the embodiment of the method shown in fig. 1 and fig. 2, the description of the embodiment is not detailed, and reference may be made to the related description in the foregoing embodiment, which is not repeated herein.
The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Finally, it should also be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.
The ETL testing method and the ETL testing apparatus provided by the present application are introduced in detail above, and specific examples are applied herein to illustrate the principles and implementations of the present application, and the descriptions of the above embodiments are only used to help understand the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A method of ETL testing, comprising:
splitting an ETL test object into test sub-objects according to a preset rule;
and testing the test sub-objects one by one.
2. The method of claim 1, wherein the step of splitting the ETL test object into test sub-objects according to preset rules comprises:
acquiring the number of detachable test sub-objects in the ETL test object;
and splitting the ETL test object into the test sub-objects with the corresponding number according to the number of the detachable test sub-objects.
3. The method of claim 2, wherein the step of obtaining the number of detachable test sub-objects in the ETL test object further comprises:
dividing each service contained in the ETL test object to obtain the number of detachable test sub-objects;
alternatively, the first and second electrodes may be,
and (4) dividing each function contained in the ETL test object to obtain the number of the detachable test sub-objects.
4. The method of claim 2, wherein the step of obtaining the number of detachable test sub-objects in the ETL test object comprises:
acquiring information of a temporary table and a target table of a storage process in the ETL test object;
and calculating the sum of the number of the temporary table and one target table to serve as the number of the detachable test sub-objects.
5. The method as claimed in claim 3 or 4, wherein the step of splitting the ETL test object into the test sub-objects according to the number of the test sub-objects that can be split comprises:
extracting code segments inserted into the same temporary table from codes corresponding to the ETL test objects;
and combining the code sections inserted into the same temporary table to form a test child object.
6. The method of claim 5, wherein the code corresponding to the ETL test object is a code segment matching Insert to Select in the code of the ETL test object, and the test sub-object is an SQL script composed of Insert to Select code segments.
7. The method of claim 1, wherein the step of testing the test sub-objects one by one comprises:
preparing a test case and test data for the test sub-object;
and executing the test case adopting the test data to obtain a test result.
8. The method of claim 7, wherein the step of preparing test cases and test data for the test sub-objects comprises:
compiling a test case according to a test purpose;
and acquiring the source table information depended by the current test child object.
9. The method of claim 1, 2, 3, 4, 6, 7, or 8, further comprising:
and packaging the test sub-objects, and executing the test cases corresponding to the test sub-objects in batch to obtain the test result of the ETL test object.
10. An apparatus for ETL testing, comprising:
the splitting module is used for splitting the ETL test object into test sub-objects according to a preset rule;
and the testing module is used for testing the testing sub-objects one by one.
HK13103671.5A 2013-03-25 Method and device for an extraction- transformation-loading test HK1176710B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110218915.6A CN102915303B (en) 2011-08-01 2011-08-01 A kind of method and apparatus of ETL test

Publications (2)

Publication Number Publication Date
HK1176710A1 true HK1176710A1 (en) 2013-08-02
HK1176710B HK1176710B (en) 2017-02-17

Family

ID=

Also Published As

Publication number Publication date
CN102915303A (en) 2013-02-06
CN102915303B (en) 2016-04-20

Similar Documents

Publication Publication Date Title
CN102915303B (en) A kind of method and apparatus of ETL test
US11030166B2 (en) Smart data transition to cloud
US8756460B2 (en) Test selection based on an N-wise combinations coverage
US9612943B2 (en) Prioritization of tests of computer program code
CN110019116B (en) Data tracing method, device, data processing equipment and computer storage medium
US20140351793A1 (en) Prioritizing test cases using multiple variables
US20150370685A1 (en) Defect localization in software integration tests
US10452515B2 (en) Automated root cause detection using data flow analysis
Li et al. A characteristic study on failures of production distributed data-parallel programs
US20150154097A1 (en) System and method for automated testing
US20140019941A1 (en) Data selection
Rehmann et al. Performance monitoring in sap hana's continuous integration process
Saleh et al. Hadoopmutator: A cloud-based mutation testing framework
Amusuo et al. Reflections on software failure analysis
EP3531285A2 (en) Ace: assurance, composed and explained
Dalton et al. Is exceptional behavior testing an exception? an empirical assessment using java automated tests
ElGamal et al. Data warehouse testing
EP3657351A1 (en) Smart data transition to cloud
CN109522206A (en) Abnormal data localization method, device, computer equipment and storage medium
Jayasinghe et al. An automated approach to create, store, and analyze large-scale experimental data in clouds
Bodner et al. Doppler: Understanding serverless query execution
US8997064B2 (en) Symbolic testing of software using concrete software execution
HK1176710B (en) Method and device for an extraction- transformation-loading test
Novak et al. Comparison of software metrics tools for: net
JP2016143107A (en) Source code evaluation system and method

Legal Events

Date Code Title Description
PC Patent ceased (i.e. patent has lapsed due to the failure to pay the renewal fee)

Effective date: 20240731