BruteDL stores a rule in the decision list only when the rule is the ``best'' for at least one training example (i.e., it has the highest Laplace accuracy among the rules matching a training example). This contributes to the space efficiency of BruteDL. No proof or experimental evidence has been given for the impact of this selective storing strategy on the prediction performance of the system. In the experiments described in Chapter 3, we found that BruteDL does not learn any homogeneous rules on the 10% training data on the glass, hepatitis and iris data. It is even worse on the smaller databases. On the small soybean domain, it learns some homogeneous rules only when given more than 50% data as training examples. On the lenses domain, BruteDL never learns any rule other than the default rule. All the testing data are covered by the default rule in these cases. On the glass, hepatitis and iris domain, the coverage of the default rule on the testing data decreased sharply to less than 5% when using more than 20% data as training data (See Table 4). Furthermore, the accuracy of the system is rather high on the databases that contain many examples even when we used only 10% data for training. This suggests that BruteDL cannot learn much when it lacks training data, although it does well on adequate training samples -- it can learn rules that are accurate enough to give satisfactory predictions for test observations regardless the performance of the default rule (see Table 2).
As indicated by the above results, BruteDL could provide satisfactory performance when trained on small sized training data if it can learn enough rules from the training examples. The major obstacle preventing BruteDL from learning more rules is that only the homogeneous rules that are both minimal and the best for at least one training example are allowed to appear in the decision list. It is natural to ask whether the system's prediction performance could be improved by relaxing the constraints to allow BruteDL to store more rules. In particular, we modify BruteDL to store all the homogeneous rules regardless of whether they are best for any training example. Since all the homogeneous rules are generated by the search and supported by the training data, they should be more informative than simply the default rule. Theoretically, when BruteDL is not given enough training data, we might expect that a rule, which is not best for any training data, may still be best for some future observation that needs to be classified. By storing all the rules, BruteDL should be more likely to generate a non-empty decision list given small training sets such as the 10% sized training set of iris data.
In our implementation of the storing-all-rules strategy, a brute-force search of the conjunctive space is done and all the rules are passed to the homogeneity and minimality tests. Those rules that are not pruned by both the tests and the pruning methods are stored in the decision list, ordered by their Laplace accuracies, and are used when classifying the test observations. Using this version of the BruteDL system, the default rule should be rarely used since more rules will appear in the decision list and the number of test cases not covered by any rules other than the default rule should decrease. To see whether this is true, we have done experiments on the storing-all-rules version of BruteDL. A detailed description of the experimental design is given in the following section.
Jing Lin