An effective control strategy must be coupled with an appropriate knowledge base organization. The knowledge base should facilitate the identification of sources of inconsistency and limit the cost of revision once inconsistencies are found. Attention to these issues is necessary or the cost advantages of the above control scheme may be negated. The systems of Section 3.2 illustrate two distinct biases towards knowledge base representation that, in turn, reflect three views of concept representation (Smith & Medin, 1981). The classical view assumes that concepts are represented by logical expressions of the necessary and/or sufficient properties of concept members. Probabilistic representations assume that concepts list important concept properties, but qualify their individual importance with probabilities or other confidence measures. Last, exemplar representations are composed of specific observations - there may be no summary description at all. In large part, the latter two strategies are motivated by the inability of classical concepts to account for psychological findings that concept members differ in their perceived typicality (e.g., a robin is a more typical bird than a penguin).
In part, CLS, AQ, and GEM reflect the classical view towards concept representation. Each assumes that perfectly-consistent logical rules can be learned. However, by themselves, classical representations have problems dealing with the frequent inconsistencies that arise during incremental induction. Thus, these systems augment classical concepts with saved observations that can be used to recompute inconsistent portions of the knowledge base. With this hybrid classical/exemplar representation, a learning system can incrementally maintain consistency. In a trivial sense, revolutionary applications of nonincremental systems can be viewed as using this type of hybrid representation. Typically, as in AQ, incremental systems further limit the scope of repair and reduce costs by reusing only representative observations.
Despite the feasibility of using hybrid representations, we believe that such a strategy is less than optimal. Except for CLS, the flat structure of a classically-represented knowledge base probably makes recognition and repair computationally intolerable for all but the smallest knowledge bases. This also applies to strict exemplar models, which dispense with a summary concept representation entirely (Kibler & Aha, 1987). A second objection to the classical/exemplar hybrid is that it is motivated by an assumption that perfect consistency is possible and paramount, encouraging dependency-directed backtracking to be invoked following each misclassification. In real-world domains it may be preferable to alter the knowledge base conservatively based on evidence collected over a span of time. Of course, saved exemplars could be used to dynamically compute such evidence, but a cheaper alternative is to maintain summary evidence throughout learning.
Schlimmer's STAGGER system rejects the classical/exemplar hybrid representation in favor of probabilistic concepts that allow cheap assessments of `average' rule consistency. Probabilistic concepts allow aspects of an observation to be recorded without insisting on revision after every misclassification. This flexibility is highly desirable in an incremental learning system, and it accounts for STAGGER's abilities to tolerate noise and track environmental changes. In this, STAGGER addresses one objection to classical/exemplar hybrids, but its knowledge base remains `flat,' and thus problems of inefficient recognition remain for all but the smallest knowledge bases.
Combining aspects of CLS and STAGGER, Schlimmer and Fisher's ID4 incrementally builds decision trees. Probabilistic representations at decision tree nodes allow knowledge base repair to be conservatively applied. Furthermore, the decision tree enables an efficient identification of nodes (subtrees) where modification optimally affects decision tree quality. ID4 disposes of a subtree when it discovers a predictive rule that is superior to the current one. In this manner, the system attempts to maximize the correct number of classifications, while recognizing that it may not be possible to achieve perfect accuracy. A second reason to discard a subtree is if the current rule losses statistical significance (according to chi-square) in response to new observations - if there is no significant rule then classification should cease and a prediction can be made.
ID4 illustrates several principles of knowledge base organization, which when coupled with the control strategy of the previous section, promise to effectively support incremental learning. The tree structure of the ID4 knowledge base makes sources (rules) of inconsistency easy to locate. Once found, the implications of faulty rules are easily isolated. A tree structure is also used in Fisher's COBWEB system, while DAG's are used in Lebowitz' (1982) UNIMEM and Kolodner's (1983) CYRUS. Furthermore, COBWEB, UNIMEM, and CYRUS each identify nodes in the tree (or DAG) where certain predictions can be `optimally' made. Fisher (1987c) illustrates how these points, labeled by normative values, serve roughly the same function as chi-square cutoff in ID4; normative values can be viewed as default values (Brachman, 1985) with probabilistic qualifiers that dynamically demarcate where predictions are justified.
STAGGER, ID4, and COBWEB indicate that hierarchically-structured probabilistic concepts are an effective knowledge base organization for incremental learning. Incremental systems that do not explicitly maintain such an organization will need to `compute' aspects of the structure on demand. For example, CLS, AQ, and GEM recompute many of the statistical and logical dependencies that are continuously maintained in a hierarchical knowledge base. A good knowledge base organization is one that `hard-wires' those dependencies that are most likely and/or most important. In this light, hierarchically-organized probabilistic concepts are efficient implementations of the classical/exemplar hybrids of earlier systems, rather than alternatives to them (cf. Fisher, 1987c).