The search for rules by OSR is limited to those rules that match a testing datum, but these rules are evaluated based on statistics computed over the ``training'' data with known labels. In many ways, OSR can be viewed as conducting a BruteDL search, but one that is focused on finding a best rule for one example only. In our context, this algorithm may better utilize the information provided by a testing datum than reliance on a simple default rule. Although OSR employs pruning methods to cut search cost, the search through all possible rules matching a test item could still be very expensive. It may not be worth applying an expensive method such as OSR as a default strategy for classification using a decision list learned by BruteDL. Therefore, a ``Minimum OSR'' strategy is used in our modification when the system encounters a testing observation that is not covered by any existing rules in the decision list. The strategy is called ``Minimum OSR'' because it involves a minimum amount of search in contrast to the original OSR. It constructs one rule that matches a test datum using a very greedy search.
In particular, Minimum OSR generates a (very linear) decision tree according
to the value vector of the testing object. At the root of the tree, Minimum
OSR calculates an information measure,
, for each possible
(or
) over the training data set. The information
measure is defined as

where:
is the probability of attribute
having value
;
is the probability that an observation is in
class
when it
has value
for
attribute
.
The test that returns the minimum value of IM is selected as the root of the tree. At the selected node, the branch that matches the value of the testing object is followed. At the child node, which covers only those training observations sharing the test observation's value of the root attribute, the value of IM for each available (remaining) test is calculated and the one with the lowest IM value is chosen. Then the branch that covers the testing object is followed. This procedure is repeated until some termination condition is satisfied. The termination condition is defined as one of the following:
To maintain the consistency of the Minimum OSR with BruteDL, we employed
the
-test to answer whether the Laplace accuracy at one node
significantly differs from the Laplace accuracy of its parent. In this test,
the value of a variable S is calculated as:

where:
observed number of correctly predicted examples covered at the
current node;
observed number of incorrectly predicted examples covered at the
current node;
;
.
The value of S is compared with a critical value with the
confidence level of
.
If S exceeds the critical value, the Laplace accuracy at
the current node is considered significantly different from that of its
parent node.
After the Minimum OSR search is terminated, an OSR rule, which is represented by the path from the root of the OSR decision tree to the leaf node, is formed. The current testing example is classified by this OSR rule instead of the default rule. Using this strategy, the default rule is used only under two circumstances:
In these cases, Minimum OSR cannot learn any informative rule for the current test item and the default rule has to be applied.
To make it clear, consider the following example. In this example, the attributes and classes are represented as:
Attributes:The current testing example is![]()
![]()
![]()
Classes:
![]()
Applying the Minimum OSR on
, a tree is built as shown in Figure
4. At the first step, we look at all
the attributes in the attribute set that are represented by the boxes
labeled
in Figure 4. At each of these nodes,
the training set is split according to the values of the corresponding
attribute. For example, at node
, the training examples are divided
into three subsets: node
contains all examples with the value
of
, node
contains all examples with the value
of
, and so on. The IM value is calculated on each
node that matches the value of the testing example. In Figure
4, IM is calculated at node
. Among
these nodes, the one that has the minimum value of IM, which is node
in this example, is selected. This process is repeated
and the next node selected is node
at level 2. A
-test
on the Laplace accuracy with confidence level 95% is then applied to node
against
its parent node, which is node
. Suppose
passes
the test and we get to node
at level 3, which
does not pass the
-test. In this case, the Minimum OSR procedure
terminates and the node
becomes a leaf node. The rule
generated by Minimum OSR is represented by the path in the bold line .
Finally,
is classified by the class that contains
the most examples at node
. For instance, if most of the
training examples covered by
are in class
, then the
corresponding rule generated by the Minimum OSR search is:
IF
=
and
=
THEN class =
,
and the current testing observation is classified as a member of class
.
Note, however, that this OSR generated rule is not retained. Rather, Minimum
OSR is rerun for each example not covered by a BruteDL rule list, but in
some future revision of this decision-list learning system, we might choose
to add OSR-generated rules as permanent members of the decision list.
Figure 4: An example of Minimum OSR
Jing Lin