As previously described, a decision list learner usually needs a certain default strategy to deal with the cases when none of the existing rules in the decision list covers a new observation. The default strategy is of particular importance if the training data is not adequate or is strongly biased, in which case the performance of the learner will be greatly affected by the default strategy. Consider a decision list that contains the following rules:
IF A1 = 1 and A3 = 0 THEN class = C1Suppose the learner encounters an observation, O: (A1 = 1, A2 = 1, A3 = 1)IF A2 = 0 THEN class = C2
BruteDL employs a default strategy -- appending a default rule at the end of the decision list as described in the previous section. This default rule strategy assumes that the training data are representative of the population of observations in the domain. In fact, the default rule can provide satisfactory performance when the training data are representative of the domain. But we cannot expect that the default rule always works well. When the training data are sparse or otherwise not representative, the performance of BruteDL may be very poor.
There are many other methods that can be used as default strategies. For example, when none of the existing rules matches an observation that needs to be classified, a closest matching rule could be found and used to predict the class value of the observation. Will this method outperform the default rule? We can also initiate a new search to learn a suitable rule for any uncovered observation. This method should be theoretically more informative than the default rule. Finally, BruteDL only stores the rules that are best for at least one training example. If we store all the rules regardless of whether they are the ``best'' for a training example, will the default rule be used less frequently? Will a more satisfactory result be provided using these or other different methods?
BruteDL may leave room for optimization. Theoretically, there are many default strategies that can be more informative and thus better than the default rule. The purpose of this thesis is to examine the performance of the default rule, to modify the default strategy of BruteDL and to show the results of using other default strategies in addition to or in place of the default rule.
Jing Lin