Constructing Decision Trees for Mining High-speed Data Streams
-
Graphical Abstract
-
Abstract
Very fast decision tree is one of the most successful and prominent algorithms specifically designed for stream data classification. In this paper, we develop a new decision tree induction model CFDT (Clustering feature decision tree model), which is an extension to VFDT (Very fast decision tree). CFDT applies a micro-clustering algorithm that scans the data only once to provide the statistical summaries of the data for incremental decision tree induction. Moreover, micro-clusters also serve as classifiers in tree leaves to improve classification accuracy and reinforce any-time property. Our experiments on synthetic and real-world datasets show that CFDT is highly scalable for data streams while also generating high classification accuracy with high speed.
-
-