On features selection approach for text mining problem | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2013. № 1(22).

On features selection approach for text mining problem

One approach of classification features selection for the text mining problem is proposed in the paper The initial system of features is defined in a high-order space, at the same time learning data set is relatively small. Classes form vast intersected system. One algorithm of features subsets generation is proposed in the paper. It is based upon compactness hypothesis: in every resulting features subset the nearest element to the one that belongs to the k's class, should also belong to the k's class, and the nearest element to the one that doesn't belong to the k's class, shouldn't belong to the k's class. Using the algorithm a medical documents classification problem, offered by JRS 2012 Contest team, has been solved. By its classification accuracy the proposed approach exceeds the nearest neighbors method and the Random Forest algorithm.

Download file
Counter downloads: 425

Keywords

text mining, выделение информативных признаков, классификация, text mining, feature selection, classification

Authors

NameOrganizationE-mail
Mangalova Ekaterina S.Reshetnev Siberian State Aerospace University (Krasnoyarsk)e.s.mangalova@hotmail.com
Agafonov Evgeny D.Siberian Federal University (Krasnoyarsk)agafonov@gmx.de
Всего: 2

References

JRS 2012 Data Mining Competition: Topical Classification of Biomedical Research Papers. [электронный ресурс], URL: tunedit.org/challenge/JRS12Contest. (дата обращения: 15.04.2012).
Загоруйко Н.Г. Прикладные методы анализа данных и знаний. - Новосибирск: Изд-во Ин-та математики, 1999. - 270 с.
Breiman L. Random forests // Machine Learning. 2001. V. 45 (1). P. 5-32.
 On features selection approach for text mining problem | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2013. № 1(22).

On features selection approach for text mining problem | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2013. № 1(22).

Download file