next up previous
Next: Training Set Size Up: Discussion Previous: Discussion

Training Set Size Influences Default Rule Coverage

In most cases, there comes a point in training where default rule coverage drops sharply: 30% for hepatitis, 20% for glass, iris and monks #1. In larger domains, 10% is a sizable amount of data. This reveals that default rule coverage tends to be higher on smaller training sets. For example, as Table 1 shows, the iris data set is relatively small among the databases. Ten percent of iris data contains only 15 examples. In fact, BruteDL does not learn any rules besides the default rule from these 15 examples. Therefore, during the testing phase, all testing observations are classified by the default rule in the learned decision list (See Table 4). This is also the case for the glass and hepatitis data sets. For the lenses data, which contains only 24 instances, BruteDL could not learn any rule except the default rule even when the training set size reaches 70%. As Table 4 shows, default rule coverage is always 100% on this domain. The default rule coverage also remains high on the small soybean domain since the soybean data set is not much larger than lenses data set. It is quite different for the large data sets such as cancer data. The default rule coverage is much lower on those domains.



Jing Lin
Mon Apr 1 19:35:53 CST 1996