Structure Approach
Structure Approach
Abstract-New models based on the hyper-geometric distribution for served curve. However, the lack of reasoning why such
estimating the number of residual software faults are proposed. The
functions were used would make us skeptical if their
applications of these models to real data are demonstrated.
methods could be applied to other cases in general.
Index Terms-Debugging, growth, hyper-geometric distribution, In view of the test process, we exploit the hyper-geo-
modeling, reliability, software, test. metric distribution as the basis of our model. This prob-
ability distribution was considered earlier in the context
of the capture-recapture sampling technique. (See, for
I . INTRODUCTION example, [SI.) While earlier studies noted the recaptured
Here we define the response function of a fault. When Let the probability that t ( i ) detects x faults be
an error due to b ( j ) is observed in (the test result o f ) Prob ( x l m , C ( i - I ) , w ( i ) ) . Then,
t ( i ), b ( j ) is said sensed by (the input test data o f ) t ( i ),
and the value of the response function Y ( i ,j ) of t ( i ) and Prob(x(m, C ( i - l ) , w(i))
b ( j ) is defined to be 1 . Otherwise, r ( i ,j ) = 0. S ( i ) is
a set of faults which are sensed by t ( i ). That is, S(i ) =
{ b ( j ) E B I Y ( i ,j ) = 1 } . Note that the response function
is defined for initial faults, before any maintenance action
is taken place. Further, we distinguish here the sensiti-
zation of faults from the detection of faults.
In the subsequent, EO would denote an estimate of any
parameter 8. w h e r e x m u s t b e o Ix Im i n { w ( i ) , m - C ( i - l ) ]
(this minimum value will be denoted hereafter by 17~).
B. Assumptions The probability distribution of ( 3 ) is called __the hyper-geo-
-
In debugging, test instances t ( l ) , t ( 2 ) , * . will be metric distribution. The expected value N ( i ) of N ( i ) is
exercised. At a test instance, some initial faults will be
newly (term newly attached to the detection will be omit- r 1
ted hereafter for simplicity) detected and the maintenance
will be carried out for these faults. The number of faults L _I
detected by a test instance may vary at actual test in-
stances. Further, new faults may be inserted, when the
detected faults are removed. In the subsequent analysis,
however, we assume the following for simplicity:
1 ) Faults detected by a test instance are surely re-
moved, before the next test instance is exercised. where ROUND represents an appropriate round operator
2) No fault is newly inserted during the removal of the to make the expected value an integer [ 121. If we use the
detected faults. Thus, the software reliability grows with following estimate E C ( i - 1 ) in place of actual C ( i -
the exercise of test instances. 1 ),
3) A test instance t ( i ) senses w ( i ) initial faults. w ( i ) .i - 1. - -
may vary with the condition of test instances over i. c
E C ( i - 1 ) = ;=0 N ( j ) , N ( O ) = 0 (5)
4) Of course, initial faults actually sensed by t ( i ) de-
pend on t ( i ) itself. However, they are regarded to be such we obtain step-by-step the estimate of the number of faults
w ( i ) initial faults that are taken randomly out of m initial detected by t ( i ) for i = 1 , 2, * and accordingly the
faults, since the order of the exercise of test instances is estimate of the cumulative number of faults detected by
indefinite or determined rather randomly within a class of test instances from t ( I ) to t ( i ).
tests. On the other hand, by using such estimate EC( j - 1 )
111. BASICMETHODOLOGY f o r j - 1 Ii a s
A. Model-] J-I __
c
C(i) = j = I N ( j ) . (2)
TOHMA cr ol. ESTIMATION O F RESIDUAL SOFTWARE FAULTS
'
347
4.
-
51
- -
2 433
~
4
-
477
5' 10. 4' 52 2 435 4 477
i 1.
5. 15. 4' 478
{m- c(i - 1) + 1 ) ( w ( i )+ 1 )
3
4
5
5.
6.
20'
"5
4'
4'
53
54
55
2
7
2
437
444
446
4
4
4
478
478
= ROUND 6 8 31 5 56 0 446 4' 479
m + 2 7 2 36 5 57 2 448 4' I07 479
8 7 43 5 58 3 451 4 108 479
9 4 47 5 59 2 453 4 109 480
10 2 49 5 60 7 460 4 480
(9) 11
I2
31
4
80
QA
5
5
61
62
3
0
463
463
4
4'
~
481
13 24 108 5 63 I 461 4.
Parameters to be estimated are m and w ( i ). In order to 14
I5
49
14
157
171
5
5
61 0 464 4. * Intcrpohled
65 I 465 4.
get these estimates, EC( i )'s will be evaluated with the 16
17
12
8
183
191
5
5
66 0 465 3.
67 0 465 3-
following measures. 18
19
9
4
?M
?M
5
5
68 1 466 3.
69 1 467 3
n 20 7 211 5 70 0 467 3'
21 6 217 5 71 0 467 3'
6 217
EF1 = (l/n) ( C ( i )- E C ( i ) l 22
23 4 230
5
5
72
73
I
1
468
469
3'
4
I = ) 24 4 251 5 74 0 469 4.
25 2 236 5 75 0 469 4.
I1 26 4 240 5 76 0 469 4.
15
35 6 277 6 85 0 473 2.
36 13 290 6 86 0 473 2-
EF3 = (l/n) 37 19
I5
309 8 87 2 475 2.
I=I 38 324 8 88 0 475 21
39 7 331 8 89 0 475 2.
40 I5 346 8 90 0 475 2.
I1 21
c
41 367 8 91 0 475 2-
42 8 375 8 92 0 475 2.
EF4 = (l/n) 43 6 381 8 93 0 475 2.
i= I 44 20 401 8 94 0 475 2.
45 IO 411 8 95 0 475 2'
46 3 414 8 96 1 476 2-
Em and E w ( i ) will be determined so as to give the min- 47 3
8
417 8
4
97 0 476 2.
48 425 98 0 476 2.
imum value of the measure. The possible change of w ( i ) 49 5 430 4 99 0 476 2.
TABLE I1
ESTIMATES
OF m , p , A N D maxw [(4)W A S USED]
EmarW
EC(i) /C(i)
500
40C
30C
20c
OBSERVED
EFI
EF2
EF3
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ EF4
1 oc
. .. .. . . . , . EFl'
( f
20 40 60 80 IO0 I20
Date
Fig. 1 . EC( i ) and C ( i ) [(4) was used].
TABLE V
FIFI.DRFPORIOF TEST
-
- - - - __ -
- - - - ~
- -
- -
- - -
- - -
- -.
~
1 0. 0
. NA 51 0. 5- NA 101 0' 27. NA 5521 N 50' NA
2 N 0. NA 52 N 5' NA 102 N 27' NA N 50. NA
3 '
0 0. NA 53 N 5. NA 103 '
0 27. NA 53 0. 50. YA
4 0. 0. NA 54 0. 5' NA 101 N 27' NA 54 I 51 NA
5 0' 0
. NA 55 I 6 NA LOS 1 28 NA 55 0. 51. NA
6 0' 0. NA 56 0. 6% NA lo6 O* 28' NA 56 N 51. NA
7 0' 0' NA 57 3 9 NA I07 N 28. NA 57 0. 51. NA
8 '
0 0' NA 58 N 9. NA IO8 0. 28. NA 58 0' 51' NA
9 0' 0' NA 59 0. 9. NA IW 0. 28. NA 159 0. 51' NA
IO 0. 0. NA 60 I1 NA I10 0' 28. NA 160 1 52 NA
I1 0' 0' NA 61 2 13 NA Ill 0. 28. NA 161 0
. 52- NA
I2 N N NA 62 0' 13' NA I12 1 9 NA 162 I 53 NA
13 0. 0' NA 63 2 IS NA 113 0. 29' NA 163 N 53' SA
14 0. 0' NA 64 1 16 NA I I4 '
0 29' NA 164 0' 53' NA
IS 2 2 NA 65 0' 16. NA 115 I 30 NA 165 0. 53. NA
16 0. 2. NA 66 0. 16' NA I I6 0' 30' NA 166 0. 53. NA
17 1 3 NA 67 0. 16. NA I I7 '
0 30' NA 167 N 53. NA
18 N 3* NA 68 18 NA 118 0' 30* NA 168 N 53- NA
19 N 3. NA 69 20 NA 119 '
0 30' NA 169 0
. 53' NA
20 0- 3' NA 70 N 20' NA I20 0. 30. NA 170 '
0 53' NA
21 0' 3. NA 71 '
0 20' NA 121 0. 30. NA 171 N 53' NA
22 N 3. NA 72 0' 20' NA I22 0. 30' NA I72 0. 53. NA
23 N 3' NA 73 0. 20. NA 123 '
0 30' NA 173 0. 53. NA
24 N 3. NA 74 0. 20' NA I24 '0 30' NA 174 0. 53' NA
25 0
. 3' NA 75 '
0 20. NA I25 2 32 NA 175 0. 53' NA
26 0. 3. NA 76 0. 20. NA I26 2 34 NA 176 N 53. NA
27 0' 3. NA 77 1 21 NA 127 (r 34. NA I77 0. 53' NA
28 0
. 3. NA 78 N 21. NA I28 1 35 NA 178 1 54 NA
29 N 3' NA 79 0. 21. NA I29 1 36 NA I79 01 54' NA
30 0. 3' NA 80 0. 21. NA I30 N 36. NA I80 0. 54. NA
31 0. 3' NA 81 0- 21. NA I31 0' 36- NA 181 0. 54- NA
32 0. 3. NA 82 0. 21. NA I32 0' 36. NA 182 N 54' NA
33 N 3' NA 83 0. 21. NA I33 1 37 NA 183 I 55 NA
34 0' 3. NA 84 0. 21. NA I34 1 38 NA 184 0. 55' NA
35 0. 3' NA 85 0' 21. NA 135 1 39 NA 185 N 55' NA
36 '
0 3' NA 86 1 22 NA 136 0. 39, NA 186 0' 55' NA
37 0. 3' NA 87 2 4 NA 137 N 39- NA 187 0. 55' NA
38 0
. 3. NA 88 0' 24. NA 138 0. 39' NA 188 0' 55. NA
39 N 3. NA 89 0. 24' NA 139 0. 39' NA I89 N 55- NA
40 1 4 NA 90 1 25 NA 140 I 40 NA I90 0. 55' NA
41 0- 4' NA 91 0. 25. NA 141 2 42 NA 191 0. 55. NA
42 0. 4' NA 92 0. 25- NA 142 N 42' NA 192 0' 55. NA
43 0' 4. NA 93 0. 25. NA 143 3 J5 NA 193 0. 55. NA
44 1 5 NA 94 0. 25' NA 144 1 46 NA 194 '
0 55' NA
45 0. 5. NA 9s 0- 25' NA 145 1 47 NA 195 '
0 55' NA
46 0. 5' NA 96 27 NA 146 0. 47- NA 196 0. 55. NA
47 0. 5' NA 97 '
0 27' NA 147 1 ?8 NA I97 0. 55. NA
48 0' 5. NA 98 0
' 27' NA 148 1 49 NA 198 N 55' NA
49 0- 5' NA 99 N 27- NA 149 1 so NA 199
- 0
- 55
- ~
NA
0' 5' NA 00 0. 27 NA I50 0' 50. NA
50 ~ ~ ~ - - - ~ - ~ - -
sharp bends of the growth curve of C( i ). The test data of different T ( j ) ' s . That is, in each T ( j ), j = 1 , 2, . . . ,
Table V is another example, where the software consists the detection of faults is modeled by the hyper-geometric
of about 14.5 kilolines of ASSEMBLER language and is distribution as before, but with different in ( j ) and \ t 3 ( i.
developed for a railway interlocking system. The direct j ). However, if we consider each test class still to test the
application of Model-1 to this data is not successful as software as a whole, m ( j ) should be the same constant
shown in Fig. 2. (Since the number of test workers was for all test intervals.
not available here, w ( i ) was assumed to be independent Further, at the beginning of T ( j ), C( n ( j - 1 ), ,j -
of i.) 1 ) must be taken into account, where t7 ( j - 1 ) is the
The bends of the growth curve of C( i ) possibly relate number of test instances of T ( j - 1 ). We assume that
to the change of the nature of test instances. In fact, we C ( 0 , j ) is not equal to 0, but C ( n ( j - l ) , j - 1 ) . In
see from the more detailed investigation that test instances the actual estimation, however, EC( n ( j - 1 ). j - 1 ) is
of Table V can be classified into test classes as shown in used in place of C ( n ( j - l ) , j - 1 ) .
Table VI. In general, the last interval will be devoted to an overall
Considering the change of the nature of test classes, we test. In such a case, the estimate of m ( j ) obtained there
divide the whole test duration into intervals T ( 1 ), T ( 2 ) , will be Em itself.
. . . , and so on. Then, the ith test instance in T ( j ) is
represented by t (i, j ). Accordingly, the cumulative num- B. Observation
ber of faults detected by test instances from t ( 1, 1 ) to t ( i, The application of the above segmentation technique to
j ) is denoted by C( i, j ). Its estimate will be suffixed sim- the test data of Table 1 revealed that the use of the same
ilarly. constant tn ( j ) over test intervals was not so favorable
We assume the parameters to take different values over [ 101. We speculate that the same constant m ( j ) generally
350 l E t E 'TRANSACTIONS ON SOFTWARE ENGLNEERING. VOL. IS. N O 3 . M A R C H 19x9
EC(i) / C ( i )
60
r
Date
TABLE VI11
makes plots of E C ( i ) rather smooth and accordingly, the Em( j ) A N D E w ( j )
use of the segmentation technique does not fit sharp bends
of the growth curve of C ( i ) well. Therefore, we assumed
in the application of the segmentation technique to the test
data of Table V that m ( j ) would take different values
over j .
Viewing test classes of Table VI and the bends on the
plots of C ( i ) of Fig. 2, we determined the test intervals V . COMPOSITE ESTIMATION
as shown in Table VII. As before, we assumed w( i, j ) to A . Model-3
remain constant in T ( j ). In the above two sections, we did not consider what
The result of the estimation is summarized in Table modules of the software were exercised by test instances.
VIII. EC( i , j )'s and C ( i, j ) ' s are plotted in Fig. 3 . The Here, a test class TC(i), i = 1, 2, * is assumed to -
improvement made by the segmentation technique is re- exercise only a part of the software. This part, denoted
markable. by D ( i ) , will be called simply the domain of TC(i).
TOHMA PI u l . : ESTIMATION OF RESIDUAL SOFTWARE FAULTS 35 1
Date
in place of m and C(i - 1 ), respectively. Except this for example, is used in the calculation of the estimate of
substitution, Em ( a ) will be obtained in the same way as the number of faults detected by a test instance of T C ( a ) .
described in Section 111. Em ( b ) and Em ( c ) will be ob- In Section 111-A, however, EC( i - 1 ) was used in place
tained likewise. of the actual C( i - 1 ). In this context, the way of cal-
Obviously, the estimate of the total number of initial culation here is a little inconsistent with those in the pre-
faults in the software may not be equal to the summation ceding sections.
of E m ( a ) , E r n ( b ) , and E m ( c ) . Note, for example, that Inprinciple,wecanuseEC,,(i- l , A n B ) , E C , ( i -
both Em ( a ) and Em ( 6 ) include faults in A f l B . There- 1, A f l B f l C ) , EC,.(i - 1, A f l C ) , and E C , ( i - 1,
fore, we must obtain E m ( A ) ,E m ( B ) ,E m ( C ) as well as A fl B n C ) . However, it makes extremely large the
E m ( A n B ) , € m ( B n C ) , E m ( C fl A ) , and E m ( A n number of cases of scanning parameters to find their op-
B n c). timum values. Therefore, this approach may not be fea-
Let C , and C ( A n B ) be the numbers of faults actually sible in real applications.
detected in D ( a ) and A fl B , respectively. We define
coefficient K , ( A n B ) to be the ratio of C ( A n B ) to C(,. B. Observation
Other coefficients will be defined similarly. Then, it seems Here we present the result of the composite estimation
reasonable to evaluate Em ( A n B ) by described above. The software used in this estimation was
developed for monitoring and real-time control. It con-
Em(A f l B ) = Em(a)K,,(A n B). (13) sists of about 90 modules. The size of modules is about 1
Similarly, Em ( A n B ) may also be evaluated by kiloline of PL/ 1, Fortran, and Assembler languages in
average.
Em(A nB) nA)
= E m ( b ) K,(B (14) The data was collected on what modules were exercised
by test instances and where faults were found. Since the
where K , ( B n A ) is the ratio of C(B nA ) = (C(A n software was tested on its three functional properties, the
B ) ) to Cl,. test instances were classified into T C ( a ) , T C ( b ) , and
If E m ( a ) and E m ( b ) would be accurately equal to T C ( c ) . Three domains D ( a ) , D ( b ) , and D ( c ) were
m ( n ) and m ( b ) , respectively, and further E m ( A n identified accordingly. These domains have common areas
B ) / E m ( a )and E m ( A f l B ) / E m ( b )would also be & ( A as shown in Table 1X. There was no module which was
f l B ) and &,(A n B ) , respectively, both (13) and (14) tested by T C ( b ) and T C ( c ) (that is, B fl C = null).
would give us exactly the same value. However, it is not The table shows the number of detected faults for each
always the case, because E m ( a ) and E m ( b ) are only the test date and for each subdomain. No faults were detected
estimates and possibly include errors. Further, due to the in A n C by T C ( a ) , in B n C by T C ( b ) , nor in C n A
rather limited number of test instances, the relationship and C n B by T C ( c ) . The total number of workers in a
that E m ( A f l B ) / E m ( a ) = K , ( A f l B ) and E m ( A fl day is divided for each test class in reference to the time
B ) / E m ( b ) = & ( A fl B ) may not hold exactly. Thus, spent for each test class. In some days, a test class (test
the above two equations may give us slightly different Val- classes) was (were) not exercised. These cases are indi-
ues. Therefore, we estimate E m ( A fl B ) by cated by "-" in the table. The number of detected faults,
Em(A n B ) = { Em(a)&(A n B ) the cumulative number of detected faults, and the number
of workers for each day are respectively summarized in
+ € m ( b )&(A fl
B)}/2. (15) the table.
TOHMA 1’1 a l . : ESTIMATION OF RESIDUAL SOFTWARE FAULTS 353
TABLE 1X
O F Ttsr
F I ~ LREPOKI
D
-
- -
T
- ~
~
Date -
7 8 9 10
- 12 13 14 15 _.
16
N ( i ) detected
by TC ( a ) in
(A 1
- - 1 1 2 1 0 1 2
(Am) - 0 0 1 0 0 1 0
( A M nC) - 0 - 0 0 1 0 1 0 0
Subtotal - 1 1 4 1 1 2 2
No. of worken
- 2.5 - 3 4 3.3 I 9.4 14 10
in TC ( a ) - - -
N (i) detected
by TC(b) in
(8 ) 1 2 - 12 4 3 4 0 0
( B a ) 0 1 - 0 0 0 0 0 0
(BnCfIA) 0 0 - 0 2 2 0 0 0
Subtotal 1 3 - 12 6 5 4 0 0
No. of workers -
0.6 1.5 12 8.7 10 5.7 4 IO
in TC(b) - -
N ( i ) detected
by TC(c) in
(C) 7 23 19 8 9 2 3 4 0
(CfMflB) 4 0 3 1 0 0 0 0 0
Subtotal 11 23 22 9 9 2 3 4 0
No.of workers 9.4 9.5 12
25 8 9 11.9 14 8
in T C ( c ) - - _.
NU) 12 26 23 22 19 8 8 6 2
C(i) 75 101 124 46 174 182 190 196 198
Total No. of
10 I1 28 28 30 26 27 32 28
worken - -
ESTIMATED
TABLE X
NUMBFR
Subc;ain
AflB
Bn C
C m
OF FAULTS
1
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING. VOL
I N SUBDOMAIYS
Esthated;of
120.9
Faults
I S . NO 3. MARCH 198’)
AflBflC 27.1
Total 248.1
250
200
150
100
OBSERVED
EF1
EF2
-
___._._ __ EF3
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ EF4
EFl *
50
C I
2 4 6 8 10 12 14 16
Date
Fig. 5. E C ( i ) and C ( i ).
number of detected faults grows smoothly with the exer- exercised in succession. In such a case, the division of
cise of test instances. the whole test duration into successive test intervals may
We have also shown ways of improving Model-1, i.e., not be justified. Some rearrangement of test instances will
the segmentation technique and the composite estimation. be required.
The application of the segmentation technique (Model-2) Here, we considered only the number of test workers
looks quite effective, particularly when the growth curve in defining w(i)/w(i,j ) . w ( i ) / w ( i ,j ) could be im-
of the cumulative number of detected faults bends sharply. proved, if additional factors such as the number of test
In determining the test intervals, we noted in part that items in a test instance and/or the progress of the skill in
the sharp bend occurred a little later than the time of the exercising test instances were taken into account. We are
change of test classes. We observed similar delays in other currently conducting further studies on this line.
examples. Thus, some mechanisms should be incorpo- The composite estimation looks promising. As we see
rated in the segmentation technique so that we could de- in Table IX, however, the number of test instances for
termine test intervals only by observing the change of test each test class was rather small to make the estimation
classes. properly. Although we hope the composite estimation as
Further, test instances of a test class are not necessarily well as the segmentation technique will be effective in
TOHMA ('I o/ F.SrlMATlON OF RFSIDUAL S O F T W A R E F A U L I S 355
where the plot curve of C ( j ) bends sharply, further Dr. Tohina is an Advisory Board Member of TC on Faull-Tolci-anl
Computing. IEEE Computer Society. and Vicc-Chairman of IFlP WG 10.4.
investigation with more data of test instances will be He served as ProFraln of FTCS.IO and w i l l
needed to evaluate our models definitely. Charman of FTCS-18 to be held i n Tohyo i n 1988 He I \ a mcnihcr ot thc
The composite estimation and the segmentation tech- hformation Processing Society of Japan. the Japan Sociely lor Softwarc
Science and Technology. the Institute o t Electronic\. Intornlation. and
nique should be combined with each other. Communication Engineers (IEICE). and other technical socictics i n Japan.
Among other awards from IEICE. he received the Excellent Booh Auard
ACKNOWLEDGMENT and the Okabe Memorial Award.