ResearchThe intuition behind the ITERATE control structure is to use the results of a hierarchical sorting scheme to choose a starting point for a partitional clustering scheme. Step 1 of the algorithm hierarchically sorts the pre-ordered objects. Steps 2 and 3 reflect the partitional clustering aspect of ITERATE. Step 2 chooses an initial set of classes from the concept tree and creates a flat partition as a good starting point for iterative redistribution in Step 3. Step 3 is the optimizing step: it uses a measure of similarity between an object and a class to determine the most appropriate class for the object. This step is repeated until a stable partition is derived, i.e., no objects move from one class to another. Iterative redistribution adopts a global perspective in trying to mitigate data-order dependency effects by allowing objects to redistribute anywhere in the partition.
The basic version of ITERATE works only with nominal-valued attributes. It can be adapted to work with numeric-valued or nominal-numeric-valued-mixed data through pre-discretizing numeric-valued attributes. A discretization algorithm based on a thresholding mechanism adapted from image-processing techniques has been developed. The approach retains more of the characteristics of the original continuous-valued attributes than other discretizing methods.
In qualitative terms, good recoverable reserves having high hydrocarbon saturation, are trapped by highly porous sediments(reservoir porosity), and are surrounded by hard bulk rocks that prevent the hydrocarbon from leaking away. A large volume of porous sediments is crucial to finding good recoverable reserves. The task of this project is to derive qualitative equation models for porosity as a function of a number of geological phenomena, such as pore geometries, permeability, rock types, depositional setting, etc..
The equation discovery process consists of two main steps: context definition and equation derivation. Context definition properly defines and formulates homogeneous regions, each of which is likely to produce a unique and meaningful analytic formula for the response variable. Clustering techniques and a suite of visualization and interpretation routines make up a tool box that assists the context definition task. Within each context, multi-variable regression analysis is conducted to derive analytic equations between the response variable and a set of relevant predictive variables, starting with one or more of the initial base models. Domain knowledge, plus a heuristic search technique called component plus residual plots dynamically guide the equation refinement process.
The discovery system architecture is illustrated below :