Shusaku Tsumoto* and Shoji Hirano Pages 59 - 66 ( 8 )
Background: Various kinds of rule induction methods have been proposed, such as induction from decision trees, decision lists, and the AQ family.
Several symbolic inductive learning methods have been proposed, such as the induction of decision trees [1, 2, 3], and the AQ family [4, 5, 6]. These methods and many variants initially introduced in the 1980s and 1990s are useful for finding frequent patterns from databases. However, conventional rule mining methods apply to a given dataset when the data has been fixed in the first run, but these methods must run from scratch every time new data appears. Since the computational complexity is n2, a repeated run would limit the applicability of these methods in the era of “Big Data”. To solve this problem, incremental learning methods have been introduced. However, most of the methods have several problems: First, they do not perform worse than conventional rule learning methods. Secondly, those methods do not generate probabilistic rules. Third, computational complexity is heavier than conventional complexity.
Methods: By using a framework of the rough set rule induction model, the authors first investigate the theoretical aspects of updates of statistical indices with additional examples used for rule selection criteria. The authors have found four possibilities for the update of indices, which in turn lead to two new rule selection criteria. If the statistical indices of a rule satisfy the first selection condition, the rule can be used even if an additional example does not support the classification of the rules. If the statistical indices of a rule satisfy the second pair of inequalities, the rule may be removed from the list of classification rules in the above case, or the rule may be included in the list if an additional example supports the classification. These rules belong to subrule layers. Based on rough set theory, we develop a new rule induction method, called PRIMEROSE-INC5 (Probabilistic Rule Induction Method based on Rough Sets for Incremental Learning Methods), which induces probabilistic rules incrementally.
Results: The system was evaluated based on the following two medical datasets, which were previously used for evaluation on conventional rule induction methods. One dataset was on the differential diagnosis of headaches, which consists of 1477 examples with 10 disease classes and 20 attributes. The other dataset was on meningitis, which consists of 198 examples with 3 classes and 25 attributes. The system was compared with other conventional rule induction methods by using repeated 10-fold crossvalidation (repeated times: 100), whose experimental results showed that the proposed system outperformed the previously introduced methods.
Incremental rule induction, rough sets, accuracy, coverage, subrule layer, big data.
Department of Medical Informatics, School of Medicine, Faculty of Medicine, Shimane University 89-1 Enya-cho Izumo 693-8501, Department of Medical Informatics, School of Medicine, Faculty of Medicine, Shimane University 89-1 Enya-cho Izumo 693-8501