BruteDL [10] does not use the separate-and-conquer strategy
mentioned in Chapter 1. Instead, it employs a brute-force search. The
system does an
exhaustive depth-first-search over the whole conjunctive rule space
and builds a corresponding decision graph. A rule set is formed with
each rule corresponding to a path from root to leaf in the constructed
decision graph. Each member of the final rule set has to be the best
rule for at least one training example. A rule is best for an example if it
has the highest accuracy among all those rules that cover the example.
In addition, BruteDL also requires that a rule appearing in the final rule
list be homogeneous and minimal. A rule is homogeneous if it is
not significantly different from all its specializations in terms of
accuracy (i.e., the proportion of correct classifications that it makes),
in other words, a rule is homogeneous if the accuracy of a rule does not
significantly change when new conjuncts are added to its conditions, though
of course its coverage (i.e., the number of examples that it matches) may
decrease as new conjuncts are added. A rule is
considered minimal when it does not contain any irrelevant conjuncts; that is,
the accuracy of the rule does not significantly change when any of the
conjuncts are removed from its conditions. Both of the homogeneity and
minimality check are done using a
statistical test
on the accuracy of a rule over the training data. That is, if the accuracy
of a rule is not significantly different from both its parent (i.e., a
generalization) and its children (i.e., specializations), then the rule
is considered homogeneous and minimal. The ``best''
rules that are also homogeneous and minimal are then sorted according to
their accuracies over the training data. BruteDL uses Laplace accuracy as
the approximation of the actual accuracy of the rules.
The Laplace accuracy is defined as

where:
number of correctly classified training examples covered by the
rule;
total number of training examples covered by the rule;
number of classes.
Since the search space of BruteDL is the possible combinations of the attribute values, the search could be extremely expensive. To limit the search cost, BruteDL employs several pruning methods as shown below.
if the number of positive training examples covered
by
is less than a predefined constant, MinSimPositives,
and the number of negative training examples covered by
is less than a predefined constant, MinSimNegatives.
Because of these pruning methods, BruteDL does not really visit every node in the whole conjunctive rule space. In some cases, it could still be too expensive for BruteDL to search the conjunctive space even with the pruning strategies. In those cases, BruteDL applies a user-defined depth to bound the search.
BruteDL's search has another important characteristic that also impacts computational cost -- the search is systematic and nonredundant. For example, consider an attribute set:
Attributes: ¯![]()
![]()
Figure 2: The search tree of a simplest brute force search
Figure 3: BruteDL's search tree
A simple search method is to generate a complete
tree as shown in Figure 2.
In this tree, the rule corresponding to node K1, which can be represented as
, is equivalent to the rule represented
by node K2, which is
. Using this method,
these two nodes are visited separately, although they are actually equivalent.
There are four pairs of such duplicate nodes (K1 and K2, L1 and L2, M1 and
M2, N1 and N2) in Figure 2. It
is obviously not an efficient method. BruteDL does not visit these duplicate
nodes during its search, but exploits a lexicographic ordering on attribute
combinations. The tree generated by BruteDL's search on the same
attribute set is shown in Figure 3. As can be seen, BruteDL
visits each rule exactly once and the search tree is much smaller than the
tree displayed in Figure 2. The search illustrated in
Figure 2 explores K-permutations of attributes, which
is redundant. In contrast, the search illustrated in Figure
3 explores K-combinations of attributes.
As in other decision list learners (which we outlined in Chapter 1), the rules that finally appear in BruteDL's decision list are of the form
Conditions
Class.
The left hand side (LHS) of a rule is a conjunctive set of conditions (e.g.
), the right hand side (RHS) is a class
value
(e.g.
). Finally, the rules in the list are ordered by their
Laplace accuracies (in non-increasing order) over the training examples.
As for the default strategy, BruteDL appends a default rule at the end of the decision list. The LHS of the default rule contains an empty condition. The RHS contains the class that is most frequently seen among the training examples that are not covered by any of the learned rules. If there are no such examples, the default simply contains the class that is most frequently seen among all the training data. Using the default rule, whenever a future observation is not covered by any previous rules in the decision list, it is classified as a member of the default class (i.e., the RHS of the default rule). The decision list generated by BruteDL will assign a class value to any future observation.