next up previous contents
Next: Application areas for incremental Up: Plan for Continued Research Previous: Plan for Continued Research

Domain-independent incremental methods

One problem with ID4 is that under certain conditions it may discard subtrees and thus much of the information gleaned from past training. New observations will eventually allow ID4 to `fill in the gaps,' but these additional costs may be avoidable. A recent extension to ID4, called ID4S, is similar to CLS in that a number of observations can be retained and reused. This represents a partial compromise with the classical/exemplar hybrid. However, ID4S reuses observations to occasionally rebuild subtrees, whereas CLS rebuilds the entire decision tree following each misclassification. Computer experiments with ID4S indicate that in the domain of chess end games (several thousand observations), few observations need to be saved (on the order of 25 - 100) to match the learning rate of revolutionary ID3. Recently, Utgoff (1987) has devised an extension to ID4, called ID5, that saves all observations. It shows how this store of observations can be exploited so that a faulty subtree can be reorganized and not simply rebuilt as with ID4S.

Work in the first year will extend ID4 in directions suggested by ID4S and ID5. These extensions will identify and save only the most `useful' observations. This general strategy is inspired by Michalski and Larson (1978), but our work draws from a significant body of psychological research that indicates humans do not treat all observations as equally representative or typical (Mervis & Rosch, 1981; Smith & Medin, 1981). Typical instances tend to share many properties with members of the same class and have few properties in common with members of different classes. To identify useful observations, ID4S will use collocation (Jones, 1983), a probabilistic measure that captures the within- and between- concept factors that effect typicality. In addition, we have verified (Fisher, 1987c; 1988) the consistency of collocation as a predictor of typicality with respect to human data (Rosch & Mervis, 1975).

Closely related to the problem of `remembering' is the problem of `forgetting.' ID3 uses the chi-square measure to prune portions of the knowledge base that do not help classification. This strategy can lead to a simpler knowledge base with little ill or even positive effect on prediction accuracy (Michalski, 1987; Quinlan, 1988). However, these results are in the context of nonincremental systems where many observations can be examined simultaneously. We will explore the utility of forgetting during incremental learning, where overly liberal strategies can have dilatory effects on learning speed. In fact, Fisher and Schlimmer (1988) have already found that a forgetting strategy based on chi-square is detrimental early in learning, though its benefits increase with training.

Efficient Induction of Decision Trees (Computational Intelligence, 1989)

This paper will describe results using an extension of ID4S that saves and reuses observations based on typicality. The major dimensions of a full-factorial experimental study include the relative utility of typical as well as atypical observations and the efficacy of concept simplification over the duration of learning. Revolutionary versions of ID3 will provide the baseline performance and cost characteristics against which ID4S will be compared.

A second project is to extend ID4S to track environmental changes. In particular, STAGGER modifies a concept definition to the extent that is minimally required to retain consistency after an environmental shift. Unfortunately, ID4S may be forced to recompute the entire decision tree should the importance of the root attribute be lessoned following changes in the environment. In this case, the utility of tree reorganization (cf. Utgoff's ID5) becomes an attractive alternative to regeneration.

Tracking Environmental Change During Learning (Machine Learning, 1989)

The abilities of STAGGER and an ID4S/ID5 hybrid to track environmental changes will be reported. The relevant dimensions of this study include the ability of each system to minimize knowledge base modifications in response to change and the time to reconverge on an accurate knowledge base. The ability of each system to distinguish `noise' from long-term changes will also be analyzed.

In addition to ID4 and other systems that learn from examples, we will continue our investigations of conceptual clustering in the first year. In this context, Fisher's COBWEB will serve as the initial model; it builds probabilistic concept trees that support good prediction along many (versus a single) attributes. In fact, prediction with a single COBWEB classification tree approximates the accuracy of multiple, special-purpose ID3 decision trees (Fisher, 1987a, c). COBWEB demonstrates how default or normative values can be dynamically identified to facilitate cost-effective prediction. In many ways, the use of normative values is analogous to the forgetting strategy of ID4. Despite COBWEB's performance, a DAG would allow more attribute-to-attribute dependencies to be directly encoded in the knowledge structure. In general, a DAG increases the flexibility that is important during incremental learning, but at the cost of maintaining multiple classification paths. DAG's are employed by Kolodner (1983) and Lebowitz (1982, 1987) in CYRUS and UNIMEM, respectively. However, neither system is characterized along dimensions such as learning speed, cost, or prediction accuracy, nor is there any indication that the dependencies that they capture are any way `optimal.'

Critical Reviews of Incremental Concept Formation Systems (Artificial Intelligence, 1989)

This paper will review reimplemented concept formation systems, including COBWEB, UNIMEM, CYRUS, and EPAM (Feigenbaum & Simon, 1984). Important dimensions for comparing these systems include the prediction accuracies along all descriptive attributes, the time required to converge on stable classifications, and the cost of assimilating a single observation. The relative costs and benefits of DAG versus tree structured knowledge bases will be addressed and will motivate a second version of COBWEB, which forms DAG-structured classifications.

The empirical studies above will shed light on a number of fundamental issues of knowledge representation and learning.

An Analysis of Knowledge Representation Schemes (Cognitive Science, 1990)

This paper will analyze the inherent tradeoffs in different knowledge representation schemes, including the classical, probabilistic, and exemplar schemes, as well as more sophisticated approaches like decision trees and probabilistic concept trees and DAGs. The paper will characterize these representation schemes in terms of pliability during incremental learning, prediction accuracy under ideal and noisy conditions, and the understandability of learned knowledge. This analysis will rely on the empirical studies reported in earlier papers. Concept simplification, default value identification, and tracking will emerge as important secondary issues. In addition, the paper will address the psychological (e.g., typicality effects) plausibility of these schemes. We have intentionally downplayed the cognitive modeling aspects of our work, but it has nonetheless played an important role in our respective dissertations (Fisher, 1987c; Schlimmer, 1987a).


next up previous contents
Next: Application areas for incremental Up: Plan for Continued Research Previous: Plan for Continued Research

Douglas H. Fisher
Thu Jan 16 18:31:47 CST 1997