next up previous
Next: Motivation for Alternative Up: BRUTEDL Previous: BRUTEDL

The Algorithm

BruteDL [10] does not use the separate-and-conquer strategy mentioned in Chapter 1. Instead, it employs a brute-force search. The system does an exhaustive depth-first-search over the whole conjunctive rule space and builds a corresponding decision graph. A rule set is formed with each rule corresponding to a path from root to leaf in the constructed decision graph. Each member of the final rule set has to be the best rule for at least one training example. A rule is best for an example if it has the highest accuracy among all those rules that cover the example. In addition, BruteDL also requires that a rule appearing in the final rule list be homogeneous and minimal. A rule is homogeneous if it is not significantly different from all its specializations in terms of accuracy (i.e., the proportion of correct classifications that it makes), in other words, a rule is homogeneous if the accuracy of a rule does not significantly change when new conjuncts are added to its conditions, though of course its coverage (i.e., the number of examples that it matches) may decrease as new conjuncts are added. A rule is considered minimal when it does not contain any irrelevant conjuncts; that is, the accuracy of the rule does not significantly change when any of the conjuncts are removed from its conditions. Both of the homogeneity and minimality check are done using a statistical test on the accuracy of a rule over the training data. That is, if the accuracy of a rule is not significantly different from both its parent (i.e., a generalization) and its children (i.e., specializations), then the rule is considered homogeneous and minimal. The ``best'' rules that are also homogeneous and minimal are then sorted according to their accuracies over the training data. BruteDL uses Laplace accuracy as the approximation of the actual accuracy of the rules. The Laplace accuracy is defined as

where:

number of correctly classified training examples covered by the rule;

total number of training examples covered by the rule;

number of classes.

Since the search space of BruteDL is the possible combinations of the attribute values, the search could be extremely expensive. To limit the search cost, BruteDL employs several pruning methods as shown below.

  1. BruteDL does not expand a node if the rule corresponding to the path from the root to this node has 100% accuracy.
  2. BruteDL does not expand a node if the number of the positive training examples covered at the node is less than a predefined constant, MinPositives.
  3. BruteDL does not expand a node corresponding to a conjunctive condition as if the number of positive training examples covered by is less than a predefined constant, MinSimPositives, and the number of negative training examples covered by is less than a predefined constant, MinSimNegatives.

Because of these pruning methods, BruteDL does not really visit every node in the whole conjunctive rule space. In some cases, it could still be too expensive for BruteDL to search the conjunctive space even with the pruning strategies. In those cases, BruteDL applies a user-defined depth to bound the search.

BruteDL's search has another important characteristic that also impacts computational cost -- the search is systematic and nonredundant. For example, consider an attribute set:

 Attributes: ¯ 


  

Figure 2: The search tree of a simplest brute force search



  

Figure 3: BruteDL's search tree


A simple search method is to generate a complete tree as shown in Figure 2. In this tree, the rule corresponding to node K1, which can be represented as , is equivalent to the rule represented by node K2, which is . Using this method, these two nodes are visited separately, although they are actually equivalent. There are four pairs of such duplicate nodes (K1 and K2, L1 and L2, M1 and M2, N1 and N2) in Figure 2. It is obviously not an efficient method. BruteDL does not visit these duplicate nodes during its search, but exploits a lexicographic ordering on attribute combinations. The tree generated by BruteDL's search on the same attribute set is shown in Figure 3. As can be seen, BruteDL visits each rule exactly once and the search tree is much smaller than the tree displayed in Figure 2. The search illustrated in Figure 2 explores K-permutations of attributes, which is redundant. In contrast, the search illustrated in Figure 3 explores K-combinations of attributes.

As in other decision list learners (which we outlined in Chapter 1), the rules that finally appear in BruteDL's decision list are of the form

Conditions Class.
The left hand side (LHS) of a rule is a conjunctive set of conditions (e.g. ), the right hand side (RHS) is a class value (e.g. ). Finally, the rules in the list are ordered by their Laplace accuracies (in non-increasing order) over the training examples.

As for the default strategy, BruteDL appends a default rule at the end of the decision list. The LHS of the default rule contains an empty condition. The RHS contains the class that is most frequently seen among the training examples that are not covered by any of the learned rules. If there are no such examples, the default simply contains the class that is most frequently seen among all the training data. Using the default rule, whenever a future observation is not covered by any previous rules in the decision list, it is classified as a member of the default class (i.e., the RHS of the default rule). The decision list generated by BruteDL will assign a class value to any future observation.



next up previous
Next: Motivation for Alternative Up: BRUTEDL Previous: BRUTEDL



Jing Lin
Mon Apr 1 19:35:53 CST 1996