next up previous
Next: Experimental Design Up: THE OSR STRATEGY Previous: THE OSR STRATEGY

The Minimum OSR

The search for rules by OSR is limited to those rules that match a testing datum, but these rules are evaluated based on statistics computed over the ``training'' data with known labels. In many ways, OSR can be viewed as conducting a BruteDL search, but one that is focused on finding a best rule for one example only. In our context, this algorithm may better utilize the information provided by a testing datum than reliance on a simple default rule. Although OSR employs pruning methods to cut search cost, the search through all possible rules matching a test item could still be very expensive. It may not be worth applying an expensive method such as OSR as a default strategy for classification using a decision list learned by BruteDL. Therefore, a ``Minimum OSR'' strategy is used in our modification when the system encounters a testing observation that is not covered by any existing rules in the decision list. The strategy is called ``Minimum OSR'' because it involves a minimum amount of search in contrast to the original OSR. It constructs one rule that matches a test datum using a very greedy search.

In particular, Minimum OSR generates a (very linear) decision tree according to the value vector of the testing object. At the root of the tree, Minimum OSR calculates an information measure, , for each possible (or ) over the training data set. The information measure is defined as

where:

is the probability of attribute having value ;

is the probability that an observation is in class when it
has value for attribute .

The test that returns the minimum value of IM is selected as the root of the tree. At the selected node, the branch that matches the value of the testing object is followed. At the child node, which covers only those training observations sharing the test observation's value of the root attribute, the value of IM for each available (remaining) test is calculated and the one with the lowest IM value is chosen. Then the branch that covers the testing object is followed. This procedure is repeated until some termination condition is satisfied. The termination condition is defined as one of the following:

  1. All the training objects at the current node are in the same class.
  2. A statistical test shows that the Laplace accuracy at the current node is not significantly different from the Laplace accuracy at its parent node.
  3. The value of IM cannot be improved (i.e., the value of IM does not decrease) by expanding the current node. This is a generalization of condition 1.

To maintain the consistency of the Minimum OSR with BruteDL, we employed the -test to answer whether the Laplace accuracy at one node significantly differs from the Laplace accuracy of its parent. In this test, the value of a variable S is calculated as:

where:

observed number of correctly predicted examples covered at the current node;

observed number of incorrectly predicted examples covered at the current node;

;

.

The value of S is compared with a critical value with the confidence level of . If S exceeds the critical value, the Laplace accuracy at the current node is considered significantly different from that of its parent node.

After the Minimum OSR search is terminated, an OSR rule, which is represented by the path from the root of the OSR decision tree to the leaf node, is formed. The current testing example is classified by this OSR rule instead of the default rule. Using this strategy, the default rule is used only under two circumstances:

  1. The values of all attributes are unknown for the current testing object.
  2. No training example has a value in common with the current testing object for any of the attributes.

In these cases, Minimum OSR cannot learn any informative rule for the current test item and the default rule has to be applied.

To make it clear, consider the following example. In this example, the attributes and classes are represented as:

 Attributes:   

Classes:

The current testing example is

Applying the Minimum OSR on , a tree is built as shown in Figure 4. At the first step, we look at all the attributes in the attribute set that are represented by the boxes labeled in Figure 4. At each of these nodes, the training set is split according to the values of the corresponding attribute. For example, at node , the training examples are divided into three subsets: node contains all examples with the value of , node contains all examples with the value of , and so on. The IM value is calculated on each node that matches the value of the testing example. In Figure 4, IM is calculated at node . Among these nodes, the one that has the minimum value of IM, which is node in this example, is selected. This process is repeated and the next node selected is node at level 2. A -test on the Laplace accuracy with confidence level 95% is then applied to node against its parent node, which is node . Suppose passes the test and we get to node at level 3, which does not pass the -test. In this case, the Minimum OSR procedure terminates and the node becomes a leaf node. The rule generated by Minimum OSR is represented by the path in the bold line . Finally, is classified by the class that contains the most examples at node . For instance, if most of the training examples covered by are in class , then the corresponding rule generated by the Minimum OSR search is: IF = and = THEN class = , and the current testing observation is classified as a member of class . Note, however, that this OSR generated rule is not retained. Rather, Minimum OSR is rerun for each example not covered by a BruteDL rule list, but in some future revision of this decision-list learning system, we might choose to add OSR-generated rules as permanent members of the decision list.


  

Figure 4: An example of Minimum OSR



next up previous
Next: Experimental Design Up: THE OSR STRATEGY Previous: THE OSR STRATEGY

Jing Lin
Mon Apr 1 19:35:53 CST 1996