On features selection approach for text mining problem
One approach of classification features selection for the text mining problem is proposed in the paper The initial system of features is defined in a high-order space, at the same time learning data set is relatively small. Classes form vast intersected system. One algorithm of features subsets generation is proposed in the paper. It is based upon compactness hypothesis: in every resulting features subset the nearest element to the one that belongs to the k's class, should also belong to the k's class, and the nearest element to the one that doesn't belong to the k's class, shouldn't belong to the k's class. Using the algorithm a medical documents classification problem, offered by JRS 2012 Contest team, has been solved. By its classification accuracy the proposed approach exceeds the nearest neighbors method and the Random Forest algorithm.
Keywords
text mining, выделение информативных признаков, классификация, text mining, feature selection, classificationAuthors
Name | Organization | |
Mangalova Ekaterina S. | Reshetnev Siberian State Aerospace University (Krasnoyarsk) | e.s.mangalova@hotmail.com |
Agafonov Evgeny D. | Siberian Federal University (Krasnoyarsk) | agafonov@gmx.de |
References
