Collective of algorithms with weights for clustering heterogeneous data
The paper considers a problem of heterogeneous data clustering. Under heterogeneous data one can understand the data that contain different structures: sphere-like and strip-like clusters; various geometric figures etc. To raise the grouping quality for such types of data, we suggest using the ensemble of different clustering algorithms. When including an algorithm into the ensemble, it is assumed that the algorithm produces better results for a specific type of structures. Besides, it is supposed that the experiment is planned so that the algorithms work independently, and each algorithm is functioning on independently chosen sets of parameters (learning conditions). For the construction of final decision it is recognized the behavior of each algorithm in the ensemble, on the basis of which a weight is attributed to it. A probabilistic model of ensemble clustering with latent classes and algorithm's weights is introduced. With use of the model, an expression for the upper bound of classification error probability is derived. To minimize the bound, a method of weights selection is suggested. The procedure of ensemble construction and finding the weights is implemented in correspondent algorithm. The efficiency of the suggested method is demonstrated by making use of Monte-Carlo modeling.
Keywords
кластерный анализ, коллективное принятие решений, алгоритмы с весами, вероятность ошибки классификации, cluster analysis, collective decision, algorithms with weights, probability of wrong classificationAuthors
| Name | Organization | |
| Berikov Vladimir B. | Sobolev Institute of mathematics Siberian Branch of the Russian Academy of Sciences (Novosibirsk) | berikov@math.nsc.ru | 
References
