A Novel Feature Selection Method Based on Probability Latent Semantic Analysis for Chinese Text Classification
-
Graphical Abstract
-
Abstract
In this paper, a novel Chinese text feature selection algorithmbased on Probability latent semantic analysis (PLSA) was presented for text classification. The algorithm first employs the Expectation-maximization method (EM) to calculate the correlations between words and the latent topics for every category documents. It then selects feature words for each latent topics and merge those words to describe the corresponding category documents. At last, it merges all feature words of every category into classification feature words. An empirical comparison with other four effective feature selection methods on a benchmark data is presented in this paper. The results show that this method could get the best classification performance.
-
-