next up previous
Next: INTRODUCTION

A Comparative Study of Default Strategies for a Decision List Learner


Masters Thesis, December, 1995

Jing Lin
Department of Computer Science
Vanderbilt University
Nashville, TN 37235 USA

Thesis under the direction of Dr. Douglas H. Fisher
Readers: Douglas H. Fisher & Vijay Raghavan

Postscript Version.


Abstract: Supervised learning algorithms, known as decision list learners, generate an ordered list of conjunctive rules (known as decision list). The rules are used to classify future observations. To classify future objects that are not covered by any of the learned rules, however, decision list learners often require a default strategy.

This thesis describes experiments with a selected decision list learner, BruteDL [10], which appends a default rule at the end of the decision list -- if no learned rule matches an observation, then the default rule classifies the observation as a member of the most frequent class among the training examples. The default rule leads to lower system prediction accuracy when the training set is small and classes are evenly distributed.

We propose an alternative default strategy, Minimum OSR, which is adapted from the OSR [1], algorithm. When no previously-learned rule matches a new observation, Minimum OSR is used to search from scratch (using the original training data) for a rule that does match the observation. The experimental results on BruteDL with Minimum OSR as the default strategy have shown significant improvement on prediction accuracy when the decision list is learned over a small training set.

Since using Minimum OSR during classification can be costly, two other strategies motivated by reducing classification cost are studied. One is to relax BruteDL's rule storage constraints and to store many more rules than would otherwise be stored in a decision list by BruteDL. The other strategy is to use a closest, partially matching rule when no learned rule matches a future object exactly. Our current implementations of these two strategies do not turn out to be as competitive as the Minimum OSR strategy in terms of the system's prediction accuracy.



ACKNOWLEDGEMENTS

I thank the people who helped me work towards the successful completion of this thesis. I thank my advisor, Dr. Douglas Fisher, for his guidance, encouragement and valuable advise at every step. I thank Dr. Vijay Raghavan, who was my second reader and seriously read my thesis and provided valuable suggestions. I also thank Richard Segal and Oren Etzioni, who provided the original BruteDL source code and several data files.

I also thank the graduate students in the Department of Computer Science. I appreciate the suggestions of Stefanos Manganaris, Julio Ortega and Doug Talbert. I thank my dear friends: Ming Chen, Yaorong Ge, Tushar Khinvasara, Cen Li, and Yang Wang, for their great friendship, suggestions and help. I thank Janet Graham, who taught me Academic English Speaking and Writing and carefully read parts of this thesis, for her suggestions and encouragement.

My grateful thanks go to my great family. I thank my husband, my mother, father and sister for their love and support. Without them, the completion of this thesis would not have been possible.


next up previous
Next: INTRODUCTION


Jing Lin
Translated to HTML by Douglas H. Fisher
Mon Apr 1 19:35:53 CST 1996