The experimental results imply that Minimum OSR outperforms the original BruteDL on several training sets, particularly when the training set is small. Since the rules learned by BruteDL are still the same in this experiment as those in the experiment described in Chapter 2, the only difference is that the Minimum OSR-rule is used before applying the default rule. Thus, we can say that the Minimum-OSR-plus-default-rule strategy outperforms the default-rule-only strategy.
The improvement comes from Minimum OSR guided by the testing data. The rule learned by it is more informative than the default rule since it takes the uncovered testing observation into account. The search is focused on the testing observation and therefore should learn a rather accurate rule which is guaranteed both to match the uncovered observation and adequately supported by the training data. For example, consider the case where only 10% data is used as training data on the iris data set. BruteDL does not learn any homogeneous rule except the default rule. Since the iris data is evenly distributed among the three classes, the prediction accuracy of the default rule would be at most around 33%. As a consequence, when the default rule is the only one that can be used to classify all the testing data, the performance of the system would not be satisfactory. In Table 2, the mean prediction accuracy of BruteDL is only 32.6%, which is actually the accuracy of the default rule since there is only one rule in the decision list. When the Minimum-OSR-plus-default-rule strategy is used, a Minimum OSR search is done for each testing observation and the prediction accuracy of the Minimum-OSR rule is much higher than that of the default rule. As shown in Table 6, the prediction accuracy of BruteDL with Minimum OSR is 63.3%. Compared to the results shown in Table 2, this is a considerable improvement. A greater improvement is observed on the small soybean domain. The original BruteDL can only provide satisfactory performance when 70% data are used for training and poor predictions are given when smaller sized training sets are used. Now, the default rule is never used and the prediction accuracies are more than doubled on the smaller training sets. Improvement is also observed on the lenses domain although the difference is not as significant as on the soybean data.
As shown in the previous section, the results of the experiments on Minimum
OSR with and without
-test are roughly the same. The difference
between the prediction accuracies of the two different versions is always
less than
%. Thus, in the Minimum-OSR search, the overfitting
problem does not seem to be a major factor.
Jing Lin