2013 - Software Test Case Reduction Using Fuzzy Clustering
2013 - Software Test Case Reduction Using Fuzzy Clustering
net/publication/272014339
CITATIONS READS
19 361
2 authors:
All content following this page was uploaded by Gaurav Kumar on 16 December 2015.
ORIGINAL RESEARCH
Received: 22 August 2012 / Accepted: 23 July 2013 / Published online: 13 August 2013
CSI Publications 2013
123
254 CSIT (September 2013) 1(3):253–260
evaluating the functional correctness, performance and A test suite TS is a finite set of n test cases. The
reliability of individual as well as composite services. Data overall length of test-suite TS is the sum of the lengths of
P
mining [7] in which clustering techniques are used and in its test-cases ti: length ðTSÞ ¼ ni¼1 length ðti Þ. Redun-
which clusters similar data that can be applied to the test dancy may be started in the process of test data gener-
suite to significantly reduce the test suite. Many techniques ation because of the process of automation. Redundancy
for test case reduction are available like equivalence class is the repetition of data between one test case and the
testing, boundary value testing, pair wise testing etc [8–10]. other. So to save lot of time from executing redundant or
unnecessary test cases, optimization of test suite is
important to achieve. The behavioural patterns exhibited
2 Background by the test suite helps us in this process of automation
[14, 15].
In this paper, we propose a method which removes the test Cyclomatic complexity [16, 17] is the number of dif-
case redundancy by use of fuzzy clustering technique. ferent paths that can be followed to get from the beginning
Using fuzziness while doing clustering provides more to the end of a method. To adequately test a function
accurate results. Before going to the proposed methodol- having lot of branches and possible paths is more complex
ogy, in the next few sub sections we shall learn some basic and more difficult and is more prone to having undiscov-
understanding of concepts related to software testing and ered bugs as compared to a simple method with only a few
fuzzy clustering. The proposed technique is very similar to different paths through the code [18]. In spite of all this, a
equivalence class testing. In equivalence class method [11], little attention has been paid to cyclomatic complexity in
input domain of a program is partitioned into a finite the software development industry. This may be due to
number of equivalence classes such that one can reason- development teams not having adequate tools to quickly
ably assume, but not be absolutely sure, that the test of a and easily measure the cyclomatic complexity of their
representative value of each class is equivalent to a test of code. The difficulty of adequately testing methods rises
any other value. In our methodology, instead of classes we quickly as the complexity goes up. The reason for using
are using fuzzy clusters that may provide more accurate cyclomatic complexity as an initial guess for number of
and efficient results. clusters in our proposed methodology is that we require
some reliable initial point at which we can start our fuzzy
2.1 Software testing clustering mechanism. For selecting initial number of
clusters, cyclomatic complexity may be omitted but then it
Testing leads to uncovering problems which enhances would take more effort as every time centroid of the
further debugging. Software deployed without testing leads clusters has to be computed and on the basis of that
to unreliability. Hence testing the software to the full increase/decrease number of clusters.
extend is a necessary task while building a software.
Testing the software implies executing possible test cases. 2.2 Fuzzy clustering
The extent of testing can be evaluated using several tech-
niques like path coverage, conditional coverage, code In hard clustering, data is divided into distinct clusters,
coverage etc. This phase of Software Development Life where each data element belongs to exactly one cluster. In
Cycle (Khan et al. [12]) is the most expensive phase. It fuzzy clustering (also referred to as soft clustering), data
requires lot of time and effort. Hence optimization of test elements can belong to more than one cluster, and asso-
cases is a must. But first we need to see what a test case ciated with each element is a set of membership levels.
looks like. These indicate the strength of the association between that
A test case [7] is a collection of different inputs for the data element and a particular cluster. Fuzzy clustering is a
software. So a test case can be compared to a tuple in a process of assigning these membership levels, and then
database table. It has ID, input1, input2,…, inputn. A good using them to assign data elements to one or more clusters.
test case is one that is able to find faults with the software. Fuzzy clustering [19, 20] allows each feature vector to
Hence the output of test case will be a pass/fail. Length of belong to more than one cluster with different membership
test case also has an impact on resulting test suites. A degrees (between 0 and 1) and vague or fuzzy boundaries
longer test case [13] finds more difficult faults but reduces between clusters. In fuzzy clustering, each point has a
the number of test cases that are necessary to achieve the degree of belonging to clusters as in fuzzy logic, rather
test objectives. Also, a longer test case has disadvantages than belonging completely to one cluster only. Thus, points
such as higher computational costs and is more difficult to on the edge of a cluster may be in the cluster to a lesser
interpret manually. degree than points in the center of cluster. Any point x has
123
CSIT (September 2013) 1(3):253–260 255
a set of coefficients giving the degree of being in the kth The degree of belonging um ij is related inversely to the
cluster wk(x) [21]. distance from x to the cluster center. It also depends on a
parameter m that controls how much weight is given to the
2.3 Difficulties with fuzzy clustering closest center. The steps followed in fuzzy c-means
algorithm are:
• The optimal number of clusters C to be created has to
be determined so a good cluster validity criterion has to • Choose a number of clusters There are two main
approaches to determine the appropriate number of
be found.
• The character and location of cluster centroid is not clusters. One is Validity Measures that are scalar
necessarily known a priori, and initial guesses have to indices that assess the goodness of the obtained
be made. partition. Clustering algorithms generally aim at locat-
• The data characterized by large variability in cluster ing well separated and compact clusters. When the
shape, cluster density, and the number of points (feature number of clusters is chosen equal to the number of
vectors) in different clusters has to be handled. groups that actually exist in the data, it can be expected
that the clustering algorithm will identify them correct.
Second approach is Iterative merging or insertion of
2.4 Fuzzy c-means clustering algorithm clusters. The basic idea of cluster merging is to start
with a sufficiently large number of clusters, and
One of the most widely used fuzzy clustering algorithms, successively reduce this number by merging clusters
which allow one piece of data to belong to two or more that are similar (compatible) with respect to some well
clusters, is the Fuzzy C-Means (FCM) Algorithm. The FCM defined criteria.
algorithm [22–25] attempts to partition a finite collection of For our proposed algorithm, we are using first one i.e.
n elements X = fx1 ; ..., xn g into a collection of c fuzzy Validity Measures that will represent cyclomatic com-
clusters with respect to some given criterion. Given a finite plexity drawn through independent paths of the graph.
set of data, the algorithm returns a list of c cluster centers • Assign randomly to each point coefficients in the
C ¼ fc1 ; . . .; cc g and a partition matrix clusters.
M ¼ ui;j 2 ½0; 1; i ¼ 1; . . .; n; j ¼ 1; . . .; c • Repeat until the algorithm has converged (the coeffi-
123
256 CSIT (September 2013) 1(3):253–260
from any given data point to a cluster center weighted by (5) Use this saved file to check for coverage of condition/
that data point’s membership grade. path for the software. This can be determined by
The command line function fcm outputs a list of cluster using the Table 1. Multiple conditions may lie under
center and several membership grades for each data point. a single cluster.
Information returned by fcm can be used to help in building (6) Until possible coverage for the conditions is achieved,
a fuzzy inference system by creating membership functions assign value of c to somewhat greater than previously
to represent the fuzzy qualities of each cluster. defined and repeat from step3. This can be achieved
by calculating standard deviation of aggregate of
cluster centroids. The point at which the standard
2.6 Select cluster algorithm deviation stops declining further and starts increasing
will define the optimum number of clusters that will
C-means algorithm [7] clusters test cases on the basis that provide good coverage for the problem with reduced
test cases in the same cluster have the same behaviour. If redundancy.
we test different test cases from the same cluster, this
would add redundancy because they would exhibit the Effectiveness of the test suite minimization [26] can be
same results. So in order to reduce this redundancy we need calculated as follows:
a selective approach of choosing test cases. The step given Number of test cases in the reduced test suite
1 100 %
below proposes a method to choose tuple randomly from a Number of test cases in the original test suite
cluster.
Step1: Input Clustered data points (all test cases) from The impact of test suite minimization can be measured
c-means clustering algorithm. as follows:
Step2: Output Single data point i.e. centroid (reduced
Number of faults detected by the reduced test suite
test case) from each Cluster. 1 100 %
Number of faults detected by the original test suite
Step3: Process to follow:
• For each cluster (1,…, i), where each Ci contains all
the tuples (ti1,…, tin) that are mapped to it from 4 Example of proposed algorithm
c-means.
• Select one tuple randomly from (ti1, ti2,…, tin). Consider the problem for the determination of the nature of
• Add selected tuple with the label Ci to output file. roots of a quadratic equation. Its input is a triple of positive
integers (say a, b, c) and value may be from interval
At the end of execution of this algorithm we have one [0,100]. The output may have one of the following words:
tuple from each cluster. [Not a quadratic equation; real roots; imaginary roots;
equal roots].
Here for the above example, we are using automated test
3 Proposed methodology case generation and from that five hundred test cases are
considered. Cyclomatic complexity for the given problem
Based on the idea of software testing and fuzzy clustering, describes that the test cases exhibits seven different
the methodology proposed in this section deals with the behaviours. Validity Measures for initial value of Clusters
efficient reduction of test suite. Proposed algorithm in that will represent cyclomatic complexity to achieve a good
stepwise manner is given as under: coverage of the test cases trying to achieve highest test case
(1) Generate test cases for the given problem either reduction. A limited number of test cases each having
manually or using automated tool. different behaviour may be sufficient enough to test for
(2) Determine cyclomatic complexity to initially assign these seven different paths. So applying the methodology
the number of clusters. proposed in the previous section by clustering these test
(3) Apply fuzzy c-means clustering algorithm to the cases. Let us choose initial dummy ‘c’ value for the
above generated data. Here c signifies how many
clusters or in other words how many test cases we are
Table 1 Condition and corre-
looking for. The initial value of c will be equal to the sponding cluster defining cov-
Conditions Cluster
cyclomatic complexity. erage of the conditions Cond. 1, 2, 3,…, j C1
(4) Apply select cluster algorithm on the clusters formed
Cond. 2, 3,…, j C2
through applying c-means clustering algorithm. Exe-
Cond. i, i ? 1,…, j Cc
cute the algorithm and save the output in a file.
123
CSIT (September 2013) 1(3):253–260 257
Table 2 Initial conceptual condition and corresponding cluster based on cyclomatic complexity
Conditional statements Cluster,
Ci
If((a [= 0) && (a \= 100) && (b [= 0) && (b \= 100) && (c [= 0) && (c \= 100)) //results in the values do not constitute a C1
quadratic equation
If((a [= 0) && (a \= 100) && (b [= 0) && (b \= 100) && (c [= 0) && (c \= 100)) //results in the inputs belong to invalid C2
range
If!(a = 0) //results in the inputs belong to invalid range C3
If(validInput = -1) //results in the values do not constitute a quadratic equation C4
If(validInput == 1) && if(d == 0) //results in the roots are equal C5
If(d [ 0) //results in the roots are real C6
If(d \ 0) //results in the roots are imaginary C7
algorithm as seven and then clustering these test cases into we have to balance objective function with number of
clusters of different behaviour. clusters. So, while applying fuzzy c-means clustering
So the output of the above algorithm in iteration one algorithm, centroid of each clusters are also calculated for
gives seven different test cases. Test the above algorithm the number of variables (i.e. three for our dummy exam-
for coverage as per the Table 2. If all the paths/conditions ple). On the basis of aggregate of centroid of each cluster,
are not covered then repeat the process by taking some we calculate the standard deviation as shown in Table 3
higher value of c and until good coverage is achieved. By and drawing a graph for the same as shown in Fig. 2. The
this approach we reduced the time wasted in testing average running time of the testing will be: test case gen-
unnecessary test cases. Further, effectiveness of the test eration ? cyclomatic complexity calculation ? cluster
suite minimization and impact of test suite minimization generation ? select cluster ? testing of the software with
can be measured by the procedure given in proposed the new test suite.
methodology section. As shown in Table 3 and Fig. 2, when the numbers of
The methodology is applied by taking a dummy exam- clusters are nine, deviation among cluster centres’ is min-
ple of three variables where it reduces the test suite sig- imum as compared to deviation in other cluster centres or
nificantly. The technique may be further tested on in other words number of reduced test cases for the given
programs that contain many variables, several conditional problem. So nine will be the minimum no. Of reduced test
checks and coverage of the conditions for different values cases for the given problem will overall coverage.
of c. As the c value increases the number of test cases to be
tested increases and also there is improvement in coverage.
6 Limitations and future work
5 Evaluation and result analysis Here for the proposed methodology, initial guess of num-
ber of clusters has to be made. In our proposed method-
On the basis of cyclomatic complexity, when we take ology we are taking it as cyclomatic complexity as it in
initial number of clusters as seven for the given problem itself defines the full fledged conditional coverage. As
we found the clusters and convergence as shown in Fig. 1a everything costs, cyclomatic complexity calculation costs
through MatLab v2010. As seven is the initial guess for the good. So to assign initial number of clusters any other
number of clusters, it has to be checked by increasing methodology may be considered that will provide coverage
number of clusters for more minimized objective function advantage as well as reliable guess for number of clusters.
with respect to Iteration Count so that deviation in center of So In the future, we will try to use any other technique
cluster is minimum. for initial guess and get more optimized test case reduction.
Increasing the number of clusters 8, 9, 10, 11 as shown As a cluster may represent more than one condition/path,
in Fig. 1(b–e), we got lesser value of objective function. we can not say, test of a value for a cluster will give 100 %
Increasing number of clusters, objective function value gets same result as a test of any other value in the same cluster.
minimized i.e. there is improvement in coverage. As we Also in the future, we will derive the effectiveness of
have to reduce number of test cases with good coverage, so reduced test suite empirically.
123
258 CSIT (September 2013) 1(3):253–260
Fig. 1 Plot of clusters with their centers and objective function versus iteration count for 7, 8, 9, 10 and 11 numbers of clusters (a–e)
123
CSIT (September 2013) 1(3):253–260 259
0.50000
0.221492416
0.00000 0.034919942
7 8 9 10 11
Number of Clusters
123
260 CSIT (September 2013) 1(3):253–260
17. Emergy KO, Mitchell BK (1989) Multi-level software testing 22. Windham MP (1982) Cluster validity for the fuzzy c-means
based on cyclomatic complexity. In: Aerospace and electronics clustering algorithm. IEEE Trans Pattern Anal Mach Intell
conference, Proceedings of the IEEE, May 1989, Vol. 2, 4:357–363
pp 500–507 23. Tilson LV, Excell PS, Green RJ (1988) A generalisation of the
18. Wang L (2010) A program segmentation method for testing data fuzzy c-means clustering algorithm. Paper presented at the
generating based on path coverage, IEEE international confer- international conference on geoscience and remote sensing, vol 3,
ence on software engineering and service sciences, July 2010, pp 1783–1784
pp 565–568 24. Davis JP, Warms TM, Winters WR (1991) A neural net imple-
19. Akthar Shaheda, Rafi SkMd (2010) Improving the software mentation of the fuzzy c-means clustering algorithm. Int Jt Conf
architecture through fuzzy clustering technique. Indian J Comput Neural Netw (IJCNN-91) 2(2):953
Sci Eng 1(1):54–57 25. Groll L, Jakel J (2005) A new convergence proof of fuzzy
20. Khatibi BV, Jawawi DNA, Hashim SZM, Khatibi E (2011) A c-means. IEEE Trans Fuzzy Syst 13(5):717–720
new fuzzy clustering based method to increase the accuracy of 26. Yoo S, Harman M (2007) Regression testing minimisation,
software development effort estimation. World Appl Sci J selection and prioritisation: a survey. Softw Test Verif Reliab,
14(9):1265–1275 Wiley InterScience. doi:10.1002/000
21. Alavi R, Lotfi S (2011) The new approach for software testing
using a genetic algorithm based on clustering initial test instan-
ces. Int Conf Comput Softw Model (IPCSIT) 14:225–231
123