Chapter 2 Basic Case-Based Reasoning Techniques
Chapter 2 Basic Case-Based Reasoning Techniques
Solution
Outcome
Another way to describe case presentations is to visualize the structure in terms of the problem space and the solution space [Watson, 1997 and Doyle, 1998]. Figure 2 illustrates the structure. According to this structure, the description of a problem resides
10
in the problem space. The retrieval process identifies the features of the case with the most similar problem. When the best matching is found, the system uses similarity metrics to find the best matching case. In those processes, the solution of a case with the most similar problem may have to be adapted to solve the new problem.
Problem Space
= description of new problem to solve = description of solved problems = stored solutions = new solution created by adaptation A A = Adaptation R = Retrieval
Solution Space
Figure 2-1 Problem and solution spaces [Watson, 1997 and Doyle, 1998]
11
1. Indexes should be predictive. 2. Predictions that can be made should be useful ones, that is, they should address the purposes the case will be used for. 3. Indexes should be abstract enough to make a case useful for future cases. 4. Indexes should be concrete enough to be recognized in the future. Methodologies for choosing indexing includes manual and automated methods. In some systems, cases are indexed by hand. For example, when the cases are complex and the knowledge needed to understand cases well enough to choose indexes accurately is not concretely available, hand indexing is needed [Kolodner, 1993]. On the other hand, if problem solving and understanding are already automated, it is advantageous to use automated indexing methods.
12
similarity(CaseI , CaseR ) =
w sim( f
i i =1
I i
, f iR )
w
i =1
Where wi is the importance weight of a feature, sim is the similarity function of features, and fiI and fiR are the values for feature i in the input and retrieved cases respectively. Figure 2-2 displays a simple scheme for nearest-neighbor matching. In this 2-dimensional space, case3 is selected as the nearest neighbor because similarity(NC, case3)> similarity(NC, case1) and similarity(NC, case3)> similarity(NC, case2).
feature2 NC - New Case case1 similarity(NC, case1) NC
case2
Figure 2-3 How to find the nearest neighbor of the new case NC.
2.3.2
Inductive Retrieval Inductive retrieval algorithm is a technique that determines which features do the
best job in discriminating cases and generates a decision tree type structure to organize the cases in memory [Watson, 1997]. This approach is very useful when a single case feature is required as a solution, and when that case feature is dependent upon others. Here is a completed decision tree (see Figure 2-4) generated from the data in Table 2-2
13
[Watson, 1997]. The task is to predict the status of a loan from features of the loan
Job status
Salaried
W aged
Yes
No
Case 1
Case 3
Case 2
Case 4
Table 2-3 A Target Case Case No. Case X Loan Status ? Monthly Income $1000 Job Status Salaried Repayment $600
14
If a target case were presented as shown in Table 2-3, to determine the loan status of the target case, the algorithm would traverse the decision tree and search for the best matching case in the case-base. For the given the loan repayment, the algorithm first selects the left branch. After this, the algorithm traverses to the node (Income>$1500) and selects the left branching according the monthly income. We can therefore predict that the best matching case is Case 4. This suggests that the loan prospect is bad because Case 4s outcome is bad.
2.3.3
Nearest-Neighbor Retrieval vs. Inductive Retrieval Nearest-neighbor retrieval and inductive retrieval are widely applied in CBR
applications and tools. Table 2-4 shows strengths and weakness of two techniques. The choice between nearest-neighbor retrieval and inductive retrieval in CBR applications requires experience and experimentation. Usually, it is a good choice using nearestneighbor retrieval without any preindexing [Watson, 1997]. If retrieval time becomes an important issue, inductive retrieval is preferable.
Table 2-4 Comparison between nearest-neighbor retrieval and inductive retrieval
Retrieval Techniques Strength Weakness
Simple
Slow retrieval speed when the case base is large 1. Depends on pre-indexing which is a time-consuming process 2. Impossible to retrieval a case while case data is missing or unknown
Inductive Retrieval
In some CBR tools, both techniques are used: inductive indexing is used to retrieve a set of matching cases, then nearest-neighbor is used to rank the cases in the set according to the similarity to the target case.
15