A well developed branch in the machine learning field is supervised learning. In supervised learning, given a set of pre-classified examples, a system learns to label an observation as a member of a class. Many algorithms that learn to perform classification have been developed. Among these algorithms there is a group that construct a decision list [9] as the result of learning. These learning systems are called decision list learners. A decision list learner generates a set of conjunctive rules. These rules are ordered according to some heuristic measure and stored as a list, which is subsequently used in classifying future observations. A future observation is classified by the first rule that matches it in the list. The methods used to generate the rule sets and the heuristic measures used in ordering the rules may be different for different decision list learners. However the basic capability of these learners is the same: recognizing the characteristics represented by the pre-classified examples (known as training data), storing them as corresponding decision rules and utilizing a heuristic method to estimate the performance of the decision list in classifying future observations.
Many decision list learners [2] [4] [6] [9] learn conjunctive rules by searching through the conjunctive rule space for rules that satisfy certain conditions. After the search terminates, these rules are used to compose a decision list. For example, many algorithms use a separate-and-conquer method [2] [9]: Initially, a search starts over the whole training data set. During the search, one rule is learned at a time over the training data. Then the training examples covered by the most recently learned rule are removed from the training set and the search continues over the rest of the training data. This procedure is repeated until all the remaining training examples are in the same class or the training set is exhausted. Using this greedy, iterative method, the rules (except for the first rule) are learned from a shrinking subset of the training data, and the quality of a rule always depends on the selection of the previous rules.
The goal of any decision list learner is to classify future observations
using the output decision list. For example, a decision list learner may
generate a rule list shown in Figure 1 as its final
output. Given a new observation, each rule in the list will be checked
to see whether it matches the observation. The observation will be
classified by
the first matching rule. For instance, given an observation Obs, which
satisfies conditions 3 and 4 (but not conditions 1 and 2), the decision
list will assign a class
value
to it according to rule 2 in Figure 1.
Figure 1: A Sample Rule List
The decision rules appearing in the final output list of a decision list learner are generated by the search over training examples. It is possible that a future observation is considerably different from any observations included in the training set; thus such a future observation may not match any of the learned rules. This is very likely in the case where the decision list learner was not supplied with adequate training data or the sampling of training data is strongly biased (i.e., drawn from a small portion of the data description space). When an observation does not match any learned rules, an intelligent learner should be able to make a reasonable guess based on its knowledge of the domain. Therefore, a default strategy is always needed to make a decision list learner complete.
A simple and straightforward default strategy is to append a default rule, which assigns a default class value to any observation that does not match any of the learned rules, at the end of the learned rule list. The default class may be defined as the one that is most frequently seen among the training examples [8] [10]. Such a strategy may not be very informative.
The purpose of this thesis is to seek improvement on decision list learners' performance via experiments with alternative default strategies. We have chosen a decision list learning system, BruteDL [10], to run our experiments on. BruteDL learns only homogeneous rules (i.e., rules that are not significantly different from their specializations in terms of their expected prediction accuracy), and that are best for at least one training example. Three different strategies have been studied:
The first strategy shows significant improvement on the prediction accuracy when the decision list is learned over a small training set. However, using Minimum OSR as a default strategy can be costly. Thus the two other strategies are motivated by a desire to reduce the cost of classification.
Jing Lin