next up previous
Next: Chapter Summary Up: STORING ALL HOMOGENEOUS Previous: Default Rule Performance

Discussion

An explanation of why Storing-all-rules BruteDL gives the ``surprising'' results can be provided by the examination of the decision lists generated by the two different versions of BruteDL. Figure 5 and Figure 6 show the rules that appeared in the two different versions' outputs and their performance on the testing data in one of the 25 trials on the 10% sized training data set on cancer domain.


  

Figure 5: The Rules and Their Performance in the Fifth Trial of BruteDL on Cancer Data Using 10% Sized Training Set


  

Figure 6: The Rules and Their Performance in the Fifth Trial of Storing-All-Rules BruteDL on Cancer Data Using 10% Sized Training Set


As displayed in the figures, there are more rules in the decision list generated by the storing-all-rules BruteDL, which is what we expected. However, it is surprising that some of the ``best'' rules found by BruteDL are not even included in the list of ``all'' the rules stored by our variant. For example, the first and seventh rules in the original BruteDL's output do not exist in Figure 6. This is because the storing-all-rules BruteDL does not really store all the rules. It actually stores only all the rules that have passed the homogeneity and minimality test. In the test, a rule is checked against all its specializations to see if its Laplace accuracy is not significantly different from those of all its specializations. By storing only the ``best'' rules, the number of the specializations of a single rule may be less than that when storing all rules. Therefore, the first rule in Figure 5 may pass the -test against all its specializations among the ``best'' rules. However, among all the rules, there may be one specialization that is not the best for any training example, (i.e., the one that would not be stored and thereby not be checked against in the original BruteDL.) which gives a Laplace accuracy that is statistically different from rule 1 and rule 1 has to be discarded. Thus, some of the ``best'' rules are eliminated from the all-rule-list. This may lead to a decreased system prediction accuracy if the eliminated ``best'' rule covers many future examples. Because of the elimination of some ``best'' rules, the frequency that the default rule is used in classifying future observations may not decrease as we might think it should. In the case of our example in Figure 5, rule 1 provides correct predictions on 62.9% of the testing data. These observations are likely to be misclassified and the accuracy of the whole system may be effected when rule 1 is removed in the storing-all-rule version. This explains why the default rule is used on some data sets (e.g., 50% and 70% sized iris training sets) although it was not used by the original BruteDL -- the only best-rule matching some test observations disappeared in the all-rule list and those observations have to be classified by the default rule.

Furthermore, the position of the new rules in the output of storing-all-rules BruteDL also affects the system's performance. For instance, the third rule in Figure 6 is not included in Figure 5 because it is not a best rule for any training example. However, it has a relatively high Laplace accuracy over the training examples. When BruteDL sorts rules in the decision list according to their Laplace accuracies, this not-best rule is placed at a position towards the beginning of the list. Then, during the later classification phase of the testing data, some of the testing observations may be misclassified by this not-best rule although it would be correctly classified by some ``best'' rules that are placed after the not-best rule. This is what happened in our example. In Figure 6, rule 3 provides incorrect predictions for 4.3% of the testing observations before other rules get a chance to correctly classify them. Rule 7 in Figure 6 is equivalent to the rule 5 in Figure 5. It correctly classifies 2.4% of the training data while none is covered by it in the storing-all-rules BruteDL. Those 2.4% testing data may be misclassified by not-best rules like rule 3, 5 or 6 in Figure 6.



next up previous
Next: Chapter Summary Up: STORING ALL HOMOGENEOUS Previous: Default Rule Performance

Jing Lin
Mon Apr 1 19:35:53 CST 1996