0% found this document useful (0 votes)

17 views16 pages

TestcaseReduction DataMiningTechnique

Test Case Reduction

Uploaded by

Suresh Krishnamoorthy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views16 pages

TestcaseReduction DataMiningTechnique

Test Case Reduction

Uploaded by

Suresh Krishnamoorthy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/308767854

Test Case Reduction Using Data Mining Technique

Article · October 2016

DOI: 10.4018/IJSI.2016100104

CITATIONS READS
13 879

4 authors:

Ahmad A. Saifan Emad Alsukhni

Yarmouk University Yarmouk University
37 PUBLICATIONS 154 CITATIONS 13 PUBLICATIONS 124 CITATIONS

SEE PROFILE SEE PROFILE

Hanadi Alawneh Ayat AL Sbaih

Yarmouk University Yarmouk University
2 PUBLICATIONS 13 CITATIONS 1 PUBLICATION 13 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Software Engineering View project

Source code-based defect prediction using deep learning and transfer learnin View project

All content following this page was uploaded by Ahmad A. Saifan on 20 February 2017.

The user has requested enhancement of the downloaded file.

Test Case Reduction Using Data Mining
Technique
Ahmad A. Saifan, Emad Alsukhni, Hanadi Alawneh, Ayat AL_Sbaih
Department of Computer Information Systems, Faculty of IT, Yarmouk University, Irbid, Jordan

ABSTRACT
Software testing is a process of ratifying the functionality of software. It is a crucial area which
consumes a great deal of time and cost. The time spent on testing is mainly concerned with testing
large numbers of unreliable test cases. Our goal is to reduce the numbers and offer more reliable test
cases, which can be achieved using certain selection techniques to choose a subset of existing test
cases. The main goal of test case selection is to identify a subset of the test cases that are capable of
satisfying the requirements as well as exposing most of the existing faults. The state of practice
among test case selection heuristics is cyclomatic complexity and code coverage. We used
clustering algorithm which is a data mining approach to reduce the number of test cases. Our
approach was able to obtain 93 unique effective test cases out a total of 504.

Keywords: test case reduction, clustering, cyclomatic complexity, coverage, redundant test cases.

1. INTRODUCTION

Software quality is an important issue that all developers of software systems want to achieve. It
currently attracts a lot of attention since software is everywhere and affects our lives on a daily
basis. Software testing is the main factor in enhancing and increasing the quality of software, for
which it is necessary to generate different test cases according to certain coverage criteria such as
graph, logic, input space, and syntax coverage (Amman & Offut, 2008). The size and complexity
of software systems is growing dramatically, in addition to which, the existence of automated tools
leads to the generation of a huge number of test cases, the execution of which causes huge losses in
cost and time (Lilly & Uma, 2010). According to Rothermel et al. (Rothermel et al., 2001), a
product of about 20,000 lines of code requires seven weeks to run all its test cases. Ultimately, the
challenge is to find a way to reduce the number of test cases or to order the test cases to validate the
system being tested.

The main goal of software testing is ensure that the software is almost free from errors. The test
process can be said to be effective when the test cases are able to locate any errors. Several tools
have been seen in the literature which automatically generate thousands of test cases for a simple
program in a few seconds, but executing those test cases takes a great deal of time. Moreover, the
tools could also generates redundant test cases (Muthyala et al., 2011). The problem is compounded
when we have complex systems, where the execution of the test cases may take several days to
complete. Moreover, it should be noted that most of the time is spent in executing redundant or
unnecessary test cases.

To identify the redundant test cases a technique such as data mining (Lilly & Uma, 2010) is
required to understand the properties of test cases, with a view to determining the similarities
between them and removing the redundant ones.

1
This paper aims to deal with this issue, of reducing the number of test cases in order to minimise the
time and cost of executing them. Several techniques can be used to reduce test cases such as
information retrieval, pairwise testing (Yoo at al., 2009) and data mining. We used the data mining
approach, mainly because of the ability of data mining to extract patterns of test cases that are
invisible.

We present our approach, concentrating on the two most effective attributes of test cases, coverage
and complexity (Kameswari et al., 2011). An empirical study presented in (Jeffery & Gupta, 2007)
suggested that during test case reduction, using several coverage criteria rather than single coverage
is more effective in selecting test cases that are able to expose different faults.

We start by collecting the test cases for a given system and then we build the dataset by selecting
coverage and complexity. Next, we use data mining technique, K-clustering, to group several test
cases into a particular cluster. Finally, redundant test cases that have the same distance to the cluster
centre point are removed. To evaluate our approach we calculate the coverage ratio of the original
test cases and compare it with the coverage ratio of the reduced test cases.

2. PROBLEM DESCRIPTION AND COMPARISON

2.1 DESCRIPTION OF THE PROBLEM
Given a program P = {C , C , … , C } consists of a set of classes C, a class that consists of a set of
methods C = {M , M … , M }. Every method consists of a set of lines such as M = {l , l , … , l }, in
which jump instructions (i.e. IF and FOR) defines branches within program traces such as b =
{l , l , … , lα } ⊆ Mand l ⊆ {IF, FOR, WHILE, ? }.

A set of test cases is defined to test program traces including instructions, methods and branches.
The set of test cases is defined as T = {t , t , … , t ∝ }, where t = {a , a , … , a } is a test case defined
to cover one of the following:

 Line Coverage: ( ) = | | covered by

 Branch Coverage: ( ) = | | covered by
 Method Coverage: ( )=| |
 Instruction coverage: number of codes that have been executed or missed.

In order to reduce the number of generated test cases according to their coverage, K-MEANS
clustering is applied using Euclidian distance that measure the distance between any given test case
and the centroid. Specifically, the set of test cases is divided into k − centroids where the distance
between every test case and its related centroid is computed.

= ∑ db − c , where c is the cluster center that db is related to (0)

A test cases t is considered redundant with t if dist(t , centroid) = dist(t , centroid), in which
t , t belong to the same centroid. In this case, pick the test case with Min Cyclomatic t , t .

2.2 COMPARISON

2
Lots of research in the literature investigated the enhancement of software testing through reduction
of test cases using the clustering data mining techniques. The goal for all these techniques is to
remove the redundant test cases and keep reliable and effective ones.

(Muthyala et al., 2011; Kameswari et al., 2011; Dash ey al., 2012 ) proposed a model that has the
same procedure in reducing the test cases using k-means algorithm based on path coverage. In
(Chantrapornchai et al., 2014) they used the branch coverage. The model starts by identifying a
centroid for each cluster. Then, they take each test case and associate it to the nearest centroid. After
that, they recalculate the new centroid for each cluster. However, they verified their approach using
a simple system that contains two variables. It means it will be so hard to apply this technique with
more complex systems. Moreover, in their approach they identify the number of clusters in advance
(i.e. before the reduction process started) based on the prime paths generated from the code. In
contrast, our approach work in different way. At the beginning we compute test suite coverage and
cyclomatic complexity. Then, we apply k-means clustering on these test suite based on distance
between cluster centroid and test cases. For those test cases that have the same distance we call it
redundant test cases. So from those redundant test cases we pick up the test case that has the lowest
cyclomatic complexity. Moreover, we used a real test cases for a system that is more complex than
the systems used in the previous works; e.g. system that has only two variable.

Another approach presented in (Chantrapornchai et al., 2014), an approach that is used to reduced
the number of test cases using k-mean clustering algorithm based on branch coverage. The
approach has the same approach discussed in (Muthyala et al., 2011; Kameswari et al., 2011; Dash
ey al., 2012 ). They apply their approach on a very simple system which is "valued added tax for
liquor chilled". They were able to reduce the number of test cases from 648 to 324 with almost the
same code coverage before and after the reduction process.

(Carslon et al., 2011) proposed a clustering approach that is used to prioritize the test cases based on
three different information which are: code coverage (method coverage), complexity and history
data on real fault. The main purpose of their approach is to show how clustering approach can
improve the rate of fault detection of test cases. Their technique utilize clustering in test case
prioritization but the primary difference between their approach and our approach is that they only
use method coverage for clustering . In contrast, our approach applies test cases reduction based on
branch coverage, path coverage and method coverage.

3. RELATED WORK

The first formal definition of test suite reduction problem was introduced in 1993 by Harrold et al.
(Harrold et al., 1993) as follows: Given. {t1, t2,...,tm} is test suite T from m test cases and {r1,
r2,...,rn} is a set of test requirements that must be satisfied in order to provide desirable coverage of
the program entities and each subset {T1, T2,...,Tn} from T are related to one of ris such that each
test case tj belonging to Ti satisfies ri. Problem. Find minimal test suite T' from T which satisfies all
ris covered by original suite T.

In general, there are two types of test case reduction technique, pre-process and post-process
(Roongruangsuwan & Daengdej, 2010). The pre-process technique reduces the test cases before the
execution of test cases and immediately after their generation. The post-process technique reduces
the number of test cases after the test is run.

Several techniques have been proposed in the literature, including heuristic algorithms (Dickinson
et al., 2001), genetic algorithm based approach (Mansour & El-Fakih, 1999), integer linear

3
programming approach (Black et al., 2004), and hybrid approach (Yoo & Harman, 2010). A survey
proposed in (Yoo & Harman, 2010) summarizes these techniques.

A great deal of work has been carried out in the area of test suite reduction; however this has mainly
been in the area of automated software testing to avoid manual testing through the generation of test
cases from UML diagrams such as state machine diagrams, use case diagrams, sequence diagrams,
etc. (Lilly & Uma, 2010; Sharma & Sharma, 2011; Sawant & Shah, 2011; Heumann, 2001).

Other researchers have worked in the test case reduction field to improve systems testing processes,
such as regression testing (Pravin & Srinivasan, 2013). In order to find a more efficient algorithm
for reducing test cases, they have presented an approach that assigns priority for each test case.
Priority is given depending upon the code coverage, and higher priority test case value is selected
for the reduced test suites. To demonstrate the effectiveness of their algorithm, the approach is
applied to two applications.

(Maung & Win, 2015) proposed the entropy based test case reduction approach to reduce test suite
size for website testing. The higher entropy value leads to the more URLs being covered. They
calculate the entropy value based on dependent pages of structural analysis. The total number of
links is chosen as the base algorithm to normalize the entropy value to the range zero to one. The
maximum entropy value of the user that covers all or most URLs is selected as a test case.

(Roongruangsuwan & Daengdej, 2010) used the Artificial Intelligence based algorithm, Case-
Based Reasoning (CBR), for test case reduction. This algorithm was used to find the most similar
test cases from the case storage and remove the redundant test cases using the CBR deletion
algorithms. The CBR stored the test cases it found in a storage in order to learn from their
experience. They proposed three reduction methods that are applied in the CBR deletion algorithms:
Test Case Complexity for Filtering (TCCF), Test Case Impact for Filtering (TCIF) and Path
Coverage for Filtering (PCF) methods. Those techniques aim to reduce the number of test cases
generated by path-oriented test case generation technique.

(Kartheek Muthyala et al., 2011) proposed a new approach to reduce the number of test cases using
data mining techniques. They used both Simple K-means and pickupCluster clustering algorithm in
order to reduce the number of test cases. (Saifan, 2016) produced an approach that use data mining
classifier techniques in order to reduce the test cases.

An approach that is similar to our approach that have been discussed by (Subashini & JeyaMala,
2014). In their approach they used the path coverage criterion in order to generate a set of test cases.
Then, they proposed a clustering technique to reduce the number of test cases using very simple
programs based on the path coverage criterion only. However, in our work we use the coverage
criteria and complexity in the clustering technique.

(Rashi et al., 2014) proposed a new approach for test suite reduction using density based clustering
technique. They started by generating a set of test cases using Selenium tool, before loading the test
cases in Weka and applying DBSCAN clustering algorithm to them. Finally, an appropriate filter
was used to remove redundant test cases. In this paper we use knowledge mining techniques,
specifically K-means clustering algorithm, in order to reduce the number of test cases and choose
those which are most effective for fault detection.

4. TEST CASE REDUCTION USING CLUSTERING METHOD:

Our test case reduction approach consists of the following five steps:

4
1. Collect a set of test cases from a given Java source code
2. Extract the complexity and the coverage for each test case to build the dataset
3. Apply K-mean clustering method to the dataset
4. Eliminate the redundant test cases using distance from the centre point
5. Analyse the results

Figure 1 shows the steps of our approach and the tools used to perform each step. We will now
describe each of these steps in detail.

Figure 1. Research methodology

4.1 PHASE 1: COLLECT A SET OF TEST CASES FROM A GIVEN JAVA SOURCE
CODE

Our approach starts with selecting the source code. In this paper, we select a Java source called
"Cinema". Table 1 shows some of the properties of the source code such as the system classes, lines
of code, and the number of methods in each class. In order to run the system we use eclipse SDK
3.7.2 software (Eclipse, 2016) which is an integrated development environment (IDE). It contains a
base workspace and an extensible plug-in system for customizing the environment written mostly in
Java programming language.

Table 1. Cinema system classes, lines of code and number of methods

Class name Lines of code Number of methods
Cine 338 16
Sala-infantil 205 8
Sala-vip 148 11
Session 46 9
Asiento 71 15
Sala 77 25
Total 885 84

Next we select a set of test cases for the given system. Our test cases come from different sources,
with some being manually generated and others automatically generated using CodeproAnalytix
tool (CodePro, 2016). CodeProAnalytix is an automated software quality and testing tool for

5
Eclipse developers, which contains many key features such as error detection, static code analysis,
code metrics, and test generation. In this paper we use one of the most important key features of this
tool which is test case generation using JUnit. The tool is used to automatically generate a set of test
cases for any given input class.

After manually generating and applying the tool we obtained 21 test suites with a different number
of test cases in each. Table 2 shows the test suites with a specific number of test cases for each one
with a total of 504 test cases.

Table 2. Number of test cases generated for each test suite.

Test suite number Test suite name Number of test cases generated
1 CineTest 7
2 SalaTest 15
3 AseientoTest 39
4 CineTest1 40
5 Sala_infantilTest 46
6 Sala_vipTest 36
7 SalaTest1 24
8 SesionTest 24
9 TestCambiarHoras 34
10 TestCubrirMutantes 19
11 TestnumerosNegativos 8
12 TestSetersGetersCine 3
13 TestSuiteAndrea 25
14 TestSuiteBea 17
15 TestSuiteCarlos 18
16 TestSuiteCesar 20
17 TestSuiteJavi 15
18 TestSuiteLaura 25
19 TestSuiteMoi 9
20 TestSuiteNacho 19
21 TestSuiteRicardo 61
Total 21 504

4.2 PHASE 2: EXTRACT THE COMPLEXITY AND THE COVERAGE FOR EACH TEST
CASE TO BUILD THE DATASET

The second step is the most important in our approach because it is the framework for the following
phases, so it will be complex and contain sub-phases. In order to build the dataset we need to select
the most important and effective attributes for the test cases. Based on the literature (Mondal et al.
2015; Upadhyay & Misra, 2012; Elbaum et al., 2002) we noticed that the average cyclomatic
complexity and the code coverage are the most two effective attributes in test case selection, so our
dataset will contain the complexity and the coverage for each test case. Firstly, we use
CodeProAnalytix tool (CodePro, 2016) to compute the average cyclomatic complexity per method
in the system which is a measure of the number of distinct paths of execution within the method.
This is measured by adding one path for the method with each of the paths created by conditional
statements (such as “if” and “for”) and operator (such as ? ). After that we need to compute the
coverage for each test case. Code coverage is one of the metrics which measure the efficiency of
the testing process, and an important quality indicator for effectiveness of the test cases. It measures
how the test cases cover all codes. In this paper we use Eclemma (Eclemma, 2016)[19], which is a
software testing tool for Eclipse, to compute the code coverage. It automatically generates code

6
coverage reports for the test cases and supports the following four different types of code coverage
(Eclemma, 2016) :

 Method coverage: number of methods have been called by test cases.

 Line coverage: number of lines run by test cases.
 Branch coverage: number of branches (if statements) evaluated as both true and false.
 Instruction coverage: number of codes that have been executed or missed.

Due to space limitation we only show the average cyclomatic complexity and the code coverage for
test suite "CineTest" (see Table 3).

Table 3. Average cyclomatic complexity for test suite CineTest.

Test Suite 1 (cine.src.tests.dominio.CineTest.java)
Test Case ID Average Code Coverage
Cyclomatic Instructions Branches Lines Methods
Complexity
1.1 test1() 3 87 50 12 1
1.2 test2() 9 76 0 28 1
1.3 test3() 6 89 50 40 1
1.4 test4() 15 83 50 97 1
1.5 test6() 3 94 0 19 1
1.6 test7() 5 81 50 31 1
1.7 testCons() 1 100 0 5 1
Total 6 85% 50% 233 8

As we can see from Table 3, the cyclomatic complexity and the code coverage for this test suite is
very high, which means that these test cases are very effective in detecting any errors in the system.
From the above two quality metrics we build the data set which consists of five attributes
(complexity, covered branches, covered lines, covered methods and covered instructions) with a
total of 504 test cases. Table 4 shows a sample of our dataset with the values of complexity and
coverage for each test case.

4.3 PHASE 3: APPLY KNOWLEDGE MINING TECHNIQUES

By using the knowledge mining technique (Yoo et al., 2009), we can reduce the original test suite to
a smaller set of test cases that preserve the original code coverage. In this paper we use one of the
three main data mining techniques, clustering, the purpose of which is to group similar points into a
single cluster for the given data. As defined in (Raamesh et al., 2009) "the behavior of grouping can
be found out by different metrics like distance, density and grid based approaches". K-mean
algorithm is considered to be one of the most popular clustering algorithms, in which k initial points
are chosen to represent initial cluster centres. All data points are assigned to the nearest one and the
mean value of the points in each cluster is computed to form its new cluster centre, with the
iteration continuing until there are no changes in the clusters (Witten & Frank, 2005). Below is the
algorithm that we used to calculate the centroid for each cluster.

7
Let db = {db , db , … , db } be the set of data points and C = {c , c , … , c } be the set of
centres.
1. Randomly select c cluster centres.
2. Calculate the distance between each data point db and cluster centers.
3. Assign the data point to the cluster centre whose distance from the cluster centre is minimum
of all the cluster centres.
4. Recalculate the new cluster centre using:

= (1 ⁄ndb ) db

where, ndb represents the number of data points in ith cluster.

5. Recalculate the distance between each data point and new obtained cluster centres.
6. If no data point was reassigned then stop, otherwise repeat from step 3).

Figure 2. Get the centroid of each cluster Algorithm

Table 4. Sample of the Dataset

Test Case # Complexity instructions Branches Lines Methods
1 3.0 87.0 50.0 12.0 1.0
2 9.0 76.0 .0 28.0 1.0
3 6.0 89.0 50.0 40.0 1.0
4 15.0 83.0 50.0 97.0 1.0
5 3.0 94.0 .0 19.0 1.0
6 5.0 81.0 50.0 31.0 1.0
7 1.0 100.0 .0 5.0 1.0
8 1.0 100.0 .0 2.0 1.0
9 1.0 100.0 .0 8.0 1.0
10 1.0 100.0 .0 8.0 1.0
11 1.0 100.0 .0 7.0 1.0
12 1.0 100.0 .0 7.0 1.0
13 1.0 100.0 .0 6.0 1.0
14 1.0 100.0 .0 8.0 1.0
15 2.0 92.0 50.0 11.0 1.0
16 5.0 98.0 50.0 15.0 1.0
17 5.0 98.0 50.0 15.0 1.0
18 5.0 98.0 50.0 15.0 1.0
19 1.0 100.0 .0 11.0 1.0
20 1.0 100.0 .0 5.0 1.0
21 1.0 100.0 .0 5.0 1.0
22 1.0 100.0 .0 5.0 1.0
23 1.0 100.0 .0 6.0 1.0
24 1.0 100.0 .0 6.0 1.0
25 1.0 100.0 .0 5.0 1.0

8
In this paper we use the Statistical Package for Social Sciences (SPSS) (SPSS, 2016) in order to
carry out the clustering process using K-mean algorithm. SPSS is used by market researchers,
health researchers, survey companies, marketing organizations, data miners and others.

Based on the number of test cases that we have in our system we decided on three clusters (K) and
we chose the default number of iteration which is ten. According to the values of all the attributes
we chose the attribute “Line” as a class for our dataset which indicates the number of covered lines
of code by a test case, because of the diversity of its values. Table 5 represents the final number of
test cases for each cluster point after applying K-mean clustering algorithm.

Table 5. Number of test cases in each cluster

Cluster number Number of test cases
1 329
2 73
3 102
Total 504

4.4 PHASE 4: REMOVE REDUNDANT TEST CASES

As mentioned above K-Means algorithm clusters the data items (test cases) (Eclemma, 2016) into
three clusters, in which each test case displays the same behaviour. We called the test cases that
belong to the same cluster redundant test cases, because they showed the same behaviour and would
exhibit the same results. So, in order to reduce the test cases we needed a selective approach to
choose the most effective test cases. We also called the test cases that are at the same distance from
the centre point for a given cluster redundant test cases. Table 6 represents a sample of how test
cases are grouped around each cluster.

Table 6. Sample of redundant test cases

Test Case # Complexity instructions Branches Lines Methods Cluster Distance
1 3.0 87.0 50.0 12.0 1.0 3 4.06130
2 9.0 76.0 .0 28.0 1.0 1 21.82651
3 6.0 89.0 50.0 40.0 1.0 3 4.19664
4 15.0 83.0 50.0 97.0 1.0 3 14.84508
5 3.0 94.0 .0 19.0 1.0 1 2.93814
6 5.0 81.0 50.0 31.0 1.0 3 10.21936
7 1.0 100.0 .0 5.0 1.0 1 3.56531
8 1.0 100.0 .0 2.0 1.0 1 3.56531
9 1.0 100.0 .0 8.0 1.0 1 3.56531
10 1.0 100.0 .0 8.0 1.0 1 3.56531
11 1.0 100.0 .0 7.0 1.0 1 3.56531
12 1.0 100.0 .0 7.0 1.0 1 3.56531
13 1.0 100.0 .0 6.0 1.0 1 3.56531
14 1.0 100.0 .0 8.0 1.0 1 3.56531
15 2.0 92.0 50.0 11.0 1.0 3 1.81336
16 5.0 98.0 50.0 15.0 1.0 3 7.74179
17 5.0 98.0 50.0 15.0 1.0 3 7.74179
18 5.0 98.0 50.0 15.0 1.0 3 7.74179

9
For example, test cases 8,9,10,11,12,13, and 14 in Table 6 are grouped around cluster one and are at
the same distance from the centre point. In this case we would choose one of them and remove the
rest because they are redundant test cases.

As a result of the process of removing the redundant test cases we obtained 93 unique effective test
cases out of a total of 504 test cases. This means that the test cases at the same distance from their
cluster’s centre point are not important and testing them would be inefficient. Table 7 shows the
number of unique test cases for each cluster after applying the removing process.

Table 7. Test cases number after reduction process.

Cluster number Number of test cases
1 29
2 20
3 44
Total 93

Figures 3 and 4 represent the distance for each test case before and after removing the redundant
test cases from the three clusters respectively.

Figure 3. Test case visualization before the reduction.

10
Figure 4. Test case visualization after the reduction.

4.5 PHASE 5: RESULT ANALYSIS

As we can see from Figures 2 and 3, 81.5% of the test cases are redundant. This percentage is high
so we went back to the test cases and looked at them manually to check how effective our approach
was in removing the redundant test cases. We found that all the test cases we had removed were
really redundant and the percentage was high because the test cases came from two different
sources; some having been generated automatically and the others manually. Moreover, to check
the effectiveness of the approach after the reduction process of redundant test cases, we recomputed
the coverage using the same Eclemma tool. Table 8 shows the differences between code coverage
before and after reduction of redundant test cases.

Table 8. The coverage Before and After Reduction

Code Coverage Before Reduction After Reduction
Instructions 88.2% 87.5%
Branches 76.7% 81.1%
Lines 83.4% 78.2%
Methods 98.1% 98.1%

From Table 8 we note that the coverage still yielded good results. In other words, the code coverage
decreased by an acceptable different percentage for each type of coverage. However, the percentage
of branch coverage decreased by 4.4%. (Zhang et al., 2014) mentioned in their paper that by using
different approaches of test case reduction the branch coverage is increased after reducing test
cases. Moreover, (Sampath et al., 2005) mentioned that when the attribute values vary then the
coverage will increase. This is what we observed after checking the branch attribute values. Before
the reduction process there were a lot of zeros compared to the other values (50, 17, 94) of branch
attribute. After the reduction process almost all of the zeros values were omitted, thus the values of
this attribute become more varied than before the reduction process.

11
5. CONCLUSIONS AND FUTURE WORK
We have demonstrated the methodology for the test case reduction using data mining clustering
algorithm. Firstly test cases were collected. Then, we computed average cyclomatic complexity and
code coverage for each test case in order to select the most effective test cases to build our data set.
After that, we applied K-mean clustering algorithm with three clusters to our data set. Finally
redundant test cases with the same distance from their cluster centres were removed. We tested our
reduced test suite for coverage and noted that the removed test cases are not important in software
testing. In this paper we used small program with small number of test cases. In the future we will
check the effectiveness and the efficiency of our approach using large project based on the ability of
the revealing faults in the software under test. Furthermore, new important feature of the test cases
which is the quality of the test case will be also considered as a parameter in the data set for the
future work.

References

Muthyala, K., & Naidu, R. (2011). A novel approach to test suite reduction using data
mining. Indian Journal of Computer Science and Engineering, 2(3), 500-505.

Raamesh, L., & Uma, G. V. (2009). Knowledge Mining of Test Case System. International Journal
on Computer Science and Engineering 2(1), 69-73.

Yoo, S., Harman, M., Tonella, P., & Susi, A. (2009, July). Clustering test cases to achieve effective
and scalable prioritisation incorporating expert knowledge. In Proceedings of the eighteenth
international symposium on Software testing and analysis (pp. 201-212). ACM.

Raamesh, L., & Uma, G. V. (2010). Reliable Mining of Automatically Generated Test Cases from
Software Requirements Specification (SRS). International Journal of Computer Science
(IJCSI). arXiv preprint arXiv:1002.1199. 7(1).

Raamesh, L., & Uma, G. V. (2010). An efficient reduction method for test cases. International
Journal of Engineering Science and Technology, 2(11).

Roongruangsuwan, S., & Daengdej, J. (2010). Test Case Reduction Methods by Using CBR.
In International Workshop on Design, Evaluation and Refinement of Intelligent Systems
(DERIS2010) (p. 75).

Sharma, S., & Sharma, A. (2011). Amalgamation of Automated Testing and Data Mining: A Novel
Approach in Software Testing. International Journal of Computer Science (IJCSI), 8(5).

Sawant, V., & Shah, K. (2011). Automatic Generation of Test Cases from UML Models.
In International Conference on Technology Systems and Management (ICTSM), Proceedings
published by International Journal of Computer Applications (IJCA).

12
Heumann, J. (2001). Generating test cases from use cases. The rational edge,6(01).

Pravin, A., & Srinivasan, D. S. (2013). An Efficient Algorithm for reducing the test cases which is
used for performing regression testing. In 2nd International Conference on Computational
Techniques and Artificial Intelligence, Dubai (UAE) (pp. 194-197).

Maung, H. M., & Win, K. (2015). An efficient test cases reduction approach in user session based
testing. International Journal of Information and Education Technology, 5(10), 768.

Bouckaert, R. R., Frank, E., Hall, M., Kirkby, R., Reutemann, P., Seewald, A., & Scuse, D. (2015).
WEKA manual for version 3-7-12.

Chauhan, R., Batra, P., & Chaudhary, S. (2014). An Efficient Approach for Test Suite Reduction
using Density based Clustering Technique. International Journal of Computer Applications, 97(11).

Eclipse Classic 3.6.2 (2016), retrieved April 17, 2016, from https://eclipse.org.

CodePro Analytix (2016). retrieved April 17, 2016, from https://developers.google.com/java-dev-

tools/codepro/doc/.

Saifan, A.(2016). Test case reduction using data mining classifier techniques. The 2nd International
conference in Computer and Information Technology (ICCIT). Istanbul, Turkey.

Mondal, D., Hemmati, H., & Durocher, S. (2015, April). Exploring test suite diversification and
code coverage in multi-objective test case selection. InSoftware Testing, Verification and Validation
(ICST), 2015 IEEE 8th International Conference on (pp. 1-10). IEEE.

Upadhyay, A. K., & Misra, A. K. (2012). Prioritizing Test Suites Using Clustering Approach in
Software Testing. International Journal of Soft Computing and Engineering (IJSCE), 2(4).

Elbaum, S., Malishevsky, A. G., & Rothermel, G. (2002). Test case prioritization: A family of
empirical studies. Software Engineering, IEEE Transactions on, 28(2), 159-182.

Java Code Coverage for Eclipse. retrieved April 17, 2016, from
http://www.eclemma.org/index.html.

Witten, I. H., & Frank, E. (2005). Data Mining: Practical machine learning tools and techniques.
Morgan Kaufmann.

SPSS Statistics Base 17.0 User’s Guide. retrieved April 17, 2016, from
http://www.jou.ufl.edu/archive/researchlab/SPSS-Statistcs-Base-Users-Guide-17.0.pdf.

Sampath, S., Sprenkle, S., Gibson, E., Pollock, L., & Souter, A. (2005, May). Analyzing clusters of
web application user sessions. In ACM SIGSOFT Software Engineering Notes (Vol. 30, No. 4, pp.
1-7).

Kameswari, U. J., Saikiran, A., Reddy, K. V. K., & Varun, N. Novel Techniques For Test Suite
Reduction. International Journal of Science and Advanced Technology, 1(8).

13
Subashini, B. & and JeyaMala, D. (2014). Reduction of Test Cases Using Clustering Technique.
In Proceedings of International Conference on Innovations in Engineering and Technology
(ICIET’14).

Jeffrey, D., & Gupta, N. (2007). Improving fault detection capability by selectively retaining test
cases during test suite reduction. IEEE Transactions on Software Engineering, 33(2), 108-123.

Harrold, M. J., Gupta, R., & Soffa, M. L. (1993). A methodology for controlling the size of a test
suite. ACM Transactions on Software Engineering and Methodology (TOSEM), 2(3), 270-285.

Dickinson, W., Leon, D., & Podgurski, A. (2001, September). Pursuing failure: the distribution of
program failures in a profile space. In ACM SIGSOFT Software Engineering Notes (Vol. 26, No. 5,
pp. 246-255). ACM.

Mansour, N., & El-Fakih, K. (1999). Simulated annealing and genetic algorithms for optimal
regression testing. Journal of Software Maintenance,11(1), 19-34.

Black, J., Melachrinoudis, E., & Kaeli, D. (2004, May). Bi-criteria models for all-uses test suite
reduction. In Proceedings of the 26th International Conference on Software Engineering (pp. 106-
115). IEEE Computer Society.

Yoo, S., & Harman, M. (2010). Using hybrid algorithm for pareto efficient multi-objective test suite
minimisation. Journal of Systems and Software, 83(4), 689-701.

Singh, R., & Santosh, M. (2013, December). Test Case Minimization Techniques: A Review.
In International Journal of Engineering Research and Technology (Vol. 2, No. 12 (December-
2013)). ESRSA Publications.

Zhang, C., Groce, A., & Alipour, M. A. (2014, July). Using test case reduction and prioritization to
improve symbolic execution. In Proceedings of the 2014 International Symposium on Software
Testing and Analysis (pp. 160-170). ACM.

Ammann, P., & Offutt, J. (2008). Introduction to software testing. Cambridge University Press.

Rothermel, G., Untch, R. H., Chu, C., & Harrold, M. J. (2001). Prioritizing test cases for regression
testing. Software Engineering, IEEE Transactions on,27(10), 929-948.

Dash, R., & & Dash, R. (2012). Application of K-mean Algorithm in Software Maintenance.
International Journal of Emerging Technology and Advanced Engineering,Vol. 2(5).

Carlson, R., Do, H., & Denton, A. (2011, September). A clustering approach to improving test case
prioritization: An industrial case study. In Software Maintenance (ICSM), 2011 27th IEEE
International Conference on (pp. 382-391). IEEE.

Chantrapornchai, C., Kinputtan, K., & Santibowanwing, A. (2014). Test Case Reduction Case
Study for White Box Testing and Black Box Testing using Data Mining. International Journal of
Software Engineering and Its Applications, 8(6), 319-338.

14
15

View publication stats

Empirical Validation of Variable Based Test Case Prioritization/Selection Technique
No ratings yet
Empirical Validation of Variable Based Test Case Prioritization/Selection Technique
9 pages
A Multipurpose Code Coverage Tool For Java
No ratings yet
A Multipurpose Code Coverage Tool For Java
11 pages
Metamorphic Testing A Review of Challenges-1-16
No ratings yet
Metamorphic Testing A Review of Challenges-1-16
16 pages
Software Test Case Quality Metrics
No ratings yet
Software Test Case Quality Metrics
12 pages
Quality of Test Specification by Application of Pa
No ratings yet
Quality of Test Specification by Application of Pa
7 pages
Reviewing Software Testing Models and Optimization Techniques
No ratings yet
Reviewing Software Testing Models and Optimization Techniques
13 pages
Test Suite Reduction via Data Mining
No ratings yet
Test Suite Reduction via Data Mining
6 pages
Chapter 86
No ratings yet
Chapter 86
11 pages
Software Testing: Coverage Myths
No ratings yet
Software Testing: Coverage Myths
10 pages
A Tool For Automated Test Data Generation (And Execution) Based On Combinatorial Approach
No ratings yet
A Tool For Automated Test Data Generation (And Execution) Based On Combinatorial Approach
19 pages
Software Testing Techniques: A Literature Review: November 2016
No ratings yet
Software Testing Techniques: A Literature Review: November 2016
7 pages
272: Software Engineering Fall 2012: Instructor: Tevfik Bultan
No ratings yet
272: Software Engineering Fall 2012: Instructor: Tevfik Bultan
55 pages
The Metaheuristic of Hybrid Evolutionary With Black-Hole Algorithm For Combinatorial Product Line Testing
No ratings yet
The Metaheuristic of Hybrid Evolutionary With Black-Hole Algorithm For Combinatorial Product Line Testing
15 pages
Testing Process
No ratings yet
Testing Process
41 pages
Icebe 2013 TMGMT 6 Final
No ratings yet
Icebe 2013 TMGMT 6 Final
7 pages
Trends in Software Testing 2017
No ratings yet
Trends in Software Testing 2017
186 pages
RSIJSEKEEverton Versao Publicada
No ratings yet
RSIJSEKEEverton Versao Publicada
24 pages
2013 - Software Test Case Reduction Using Fuzzy Clustering
No ratings yet
2013 - Software Test Case Reduction Using Fuzzy Clustering
9 pages
Auto Test
No ratings yet
Auto Test
80 pages
Software Testing Through Evidence Gathering
No ratings yet
Software Testing Through Evidence Gathering
8 pages
Software Testing (18IS62) Module-1
No ratings yet
Software Testing (18IS62) Module-1
19 pages
Metaheuristics Applied To Automatic Software Testing: A Brief Overview
No ratings yet
Metaheuristics Applied To Automatic Software Testing: A Brief Overview
17 pages
Maximizing Test Coverage For Security Threats Using Optimal Test Data Generation
No ratings yet
Maximizing Test Coverage For Security Threats Using Optimal Test Data Generation
15 pages
Exploring The Industry's Challenges in Software Testing
No ratings yet
Exploring The Industry's Challenges in Software Testing
29 pages
272: Software Engineering Fall 2012: Instructor: Tevfik Bultan Lecture 5: Testing Overview, Foundations
No ratings yet
272: Software Engineering Fall 2012: Instructor: Tevfik Bultan Lecture 5: Testing Overview, Foundations
55 pages
Unit 5
No ratings yet
Unit 5
30 pages
Executing The Test Plan: By: Abel Almeida
50% (2)
Executing The Test Plan: By: Abel Almeida
41 pages
Nformation Management System
No ratings yet
Nformation Management System
19 pages
Chapter 2
No ratings yet
Chapter 2
21 pages
Test
No ratings yet
Test
12 pages
Testing Approach For Automatic Test Case Generation and Optimization Using GA
No ratings yet
Testing Approach For Automatic Test Case Generation and Optimization Using GA
3 pages
CH 2 Software Testing
No ratings yet
CH 2 Software Testing
4 pages
Software Testing & QA Essentials
No ratings yet
Software Testing & QA Essentials
4 pages
Testing of Program Correctnes in Formal Theory: Special Issue On ICIT 2009 Conference - Bioinformatics and Image
No ratings yet
Testing of Program Correctnes in Formal Theory: Special Issue On ICIT 2009 Conference - Bioinformatics and Image
10 pages
Software Quality - Traditional Vs Agile An Empiric
No ratings yet
Software Quality - Traditional Vs Agile An Empiric
7 pages
Computational Thinking Assessment - Towards More Vivid Interpretations
No ratings yet
Computational Thinking Assessment - Towards More Vivid Interpretations
31 pages
SOFTWARE TESTING Research Paper
No ratings yet
SOFTWARE TESTING Research Paper
6 pages
A Multi Objective Binary Bat Approach For Testcase Selection in Object Oriented Testing
No ratings yet
A Multi Objective Binary Bat Approach For Testcase Selection in Object Oriented Testing
12 pages
Software Quality Updated
No ratings yet
Software Quality Updated
25 pages
Abstracts For Analyzing With Answers 1-16
No ratings yet
Abstracts For Analyzing With Answers 1-16
9 pages
Module - 2
No ratings yet
Module - 2
32 pages
An Empirical Study of Statement Coverage Criteria To Reduce The Test Cases-A Review
No ratings yet
An Empirical Study of Statement Coverage Criteria To Reduce The Test Cases-A Review
4 pages
Test Techniques Overview
No ratings yet
Test Techniques Overview
6 pages
Common Test Patterns and Re..
No ratings yet
Common Test Patterns and Re..
5 pages
Testing
No ratings yet
Testing
67 pages
09 - Chapter 1
No ratings yet
09 - Chapter 1
19 pages
CSE451-S2 Autumn2024 Paper Group4
No ratings yet
CSE451-S2 Autumn2024 Paper Group4
11 pages
Advanced Automated Software Testing Frameworks For Refined Practice 1st Edition Alsmadi Online Version
100% (4)
Advanced Automated Software Testing Frameworks For Refined Practice 1st Edition Alsmadi Online Version
140 pages
1 Object-Oriented Measures As Testability Indicators
No ratings yet
1 Object-Oriented Measures As Testability Indicators
18 pages
Software Testing Practices in Industry The State o
No ratings yet
Software Testing Practices in Industry The State o
11 pages
CHAPTER 2 Software Testig
No ratings yet
CHAPTER 2 Software Testig
18 pages
Risk Factors in Software Development Phases: European Scientific Journal January 2014
No ratings yet
Risk Factors in Software Development Phases: European Scientific Journal January 2014
21 pages
Unit 4
No ratings yet
Unit 4
29 pages
Internet Survey Data Quality
No ratings yet
Internet Survey Data Quality
9 pages
Test Case Optimization A Nature Inspired Approach Using Bacteriologic Algorithm
No ratings yet
Test Case Optimization A Nature Inspired Approach Using Bacteriologic Algorithm
18 pages
Software Testing Basics & Techniques
100% (1)
Software Testing Basics & Techniques
63 pages
Sms Paper
No ratings yet
Sms Paper
26 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
29 pages
Clustering High-Dimensional Data - A Survey On Subspace Clustering, Pattern-Based Clustering, and Correlation Clustering
No ratings yet
Clustering High-Dimensional Data - A Survey On Subspace Clustering, Pattern-Based Clustering, and Correlation Clustering
58 pages
A Lidar Camera Fusion Approach For Automated Detection and Assessment of Potholes Using An Autonomous Vehicle Platform
No ratings yet
A Lidar Camera Fusion Approach For Automated Detection and Assessment of Potholes Using An Autonomous Vehicle Platform
14 pages
Machine Learning MCQ S
No ratings yet
Machine Learning MCQ S
318 pages
IT 7th SEM AICTE111220011103
No ratings yet
IT 7th SEM AICTE111220011103
19 pages
Data Mining Essentials Guide
No ratings yet
Data Mining Essentials Guide
23 pages
Machine Learning (6CS4-02) Unit-3 Notes
No ratings yet
Machine Learning (6CS4-02) Unit-3 Notes
21 pages
Data Science Interview Prep Guide
No ratings yet
Data Science Interview Prep Guide
25 pages
AD502 QuestionBank
No ratings yet
AD502 QuestionBank
2 pages
Engineering Minors: School of Computer Science and Engineering
No ratings yet
Engineering Minors: School of Computer Science and Engineering
67 pages
Fuzzy Clustering Based Segmentation of Time-Series
No ratings yet
Fuzzy Clustering Based Segmentation of Time-Series
12 pages
Advances of Machine Learning in Multi-Energy District Communities 2022
No ratings yet
Advances of Machine Learning in Multi-Energy District Communities 2022
28 pages
Lec 05 - K-Means
No ratings yet
Lec 05 - K-Means
4 pages
Clustering For Streams and Parallelism
0% (1)
Clustering For Streams and Parallelism
4 pages
CS1004 DataMining Unit 4 Notes
No ratings yet
CS1004 DataMining Unit 4 Notes
8 pages
How To Break Down A Set Defence
No ratings yet
How To Break Down A Set Defence
27 pages
DIBCO 2019: Document Image Binarization Competition Results
No ratings yet
DIBCO 2019: Document Image Binarization Competition Results
10 pages
HW2 6276dsba
No ratings yet
HW2 6276dsba
5 pages
SCOR - Analisis Bibliometrik
No ratings yet
SCOR - Analisis Bibliometrik
14 pages
Unsupervised ML Using Python - Syllabus
No ratings yet
Unsupervised ML Using Python - Syllabus
2 pages
Self-Quiz Unit 5 - Attempt Review
No ratings yet
Self-Quiz Unit 5 - Attempt Review
6 pages
K-Means & GMM
No ratings yet
K-Means & GMM
193 pages
Applied Computational Intelligence and Soft Computing - 2024 - Ahmed - Student Performance Prediction Using Machine
No ratings yet
Applied Computational Intelligence and Soft Computing - 2024 - Ahmed - Student Performance Prediction Using Machine
15 pages
Determining The Number of Groups From Measures of Cluster Stability
No ratings yet
Determining The Number of Groups From Measures of Cluster Stability
10 pages
66 Job Interview Questions For Data Scientists
No ratings yet
66 Job Interview Questions For Data Scientists
10 pages
Computational Intelligence: (Introduction To Machine Learning)
No ratings yet
Computational Intelligence: (Introduction To Machine Learning)
55 pages
Mastering Machine Learning - A Comprehensive Guide
No ratings yet
Mastering Machine Learning - A Comprehensive Guide
19 pages
Machine Learning Theory and Practice
No ratings yet
Machine Learning Theory and Practice
299 pages
Machine Learning Internship Report
No ratings yet
Machine Learning Internship Report
20 pages
Accuracy Assessment and Validation of Remotely Sensed
No ratings yet
Accuracy Assessment and Validation of Remotely Sensed
10 pages

TestcaseReduction DataMiningTechnique

Uploaded by

TestcaseReduction DataMiningTechnique

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Test Case Reduction Using Data Mining Technique

Article · October 2016

Ahmad A. Saifan Emad Alsukhni

SEE PROFILE SEE PROFILE

Hanadi Alawneh Ayat AL Sbaih

SEE PROFILE SEE PROFILE

Software Engineering View project

The user has requested enhancement of the downloaded file.

2. PROBLEM DESCRIPTION AND COMPARISON

 Line Coverage: ( ) = | | covered by

= ∑ db − c , where c is the cluster center that db is related to (0)

4. TEST CASE REDUCTION USING CLUSTERING METHOD:

Figure 1. Research methodology

Table 1. Cinema system classes, lines of code and number of methods

Table 2. Number of test cases generated for each test suite.

 Method coverage: number of methods have been called by test cases.

Table 3. Average cyclomatic complexity for test suite CineTest.

4.3 PHASE 3: APPLY KNOWLEDGE MINING TECHNIQUES

where, ndb represents the number of data points in ith cluster.

Figure 2. Get the centroid of each cluster Algorithm

Table 4. Sample of the Dataset

Table 5. Number of test cases in each cluster

4.4 PHASE 4: REMOVE REDUNDANT TEST CASES

Table 6. Sample of redundant test cases

Table 7. Test cases number after reduction process.

Figure 3. Test case visualization before the reduction.

4.5 PHASE 5: RESULT ANALYSIS

Table 8. The coverage Before and After Reduction

CodePro Analytix (2016). retrieved April 17, 2016, from https://developers.google.com/java-dev-

View publication stats

You might also like