The generalizing ability of algorithms by the measure of compactness
To estimate the generalizing ability of recognition algorithms, it is offered to use a measure of compactness. It is assumed that a training sample Eo = [Si,...JSm] is defined, divided by disjoint classes Ki, ..., Ki, l > 2. The objects of Eo are described by a set of different-type features ofX(n) = (x1, ..., xn). The compactness value depends on the dimension and composition of the feature set, the number of noise objects to be deleted, and the number of objects-standards of the minimal coverage of Eo. The compactness measure on the sample Eo in the set of features X(k) с X (n) (k < n) is calculated as ' m - Sh (X,X (к))Y m - Sh (X,X (к))л m F ( X (к) X) = CF where CF is the number of objects-standards of the minimal coverage of the sample in which Sh(X,X(k)) noise objects are removed. Let Sk e Ki, p(Sk, Sr) = min p(Sk, Sj) and Z = |{Sц e Ki | p(Sk, S^) < p(Sk, Sr)}| is the number of objects in the hypersphere with the SjeCK, center in Sk. The object Sr e CKi is considered as the noise object if the condition holds ZZ -1 1 K\ m - \K,\' where ZZ = | S e Ki\ p(Sr, Sk) < p(Sp, Sk) < p(Sn, Sk)} |, < min \Ki\, p(Sn, Sk) = min p(Sj, Sk). The ZZ value is the number of 1r } representatives of the class Ki added to the hypersphere with center at Sk e Ki after removing the noise object Sr. To find informative sets {X(k) | X(k) с X(n)}, two criteria are proposed. Both criteria do not explicitly use the number of objects-standards of minimum coverage CF. The generalizing ability of algorithms was calculated by the method of Cross Validation on the initial and informative sets of features. The highest values were on the sets obtained according to the criterion i X mt ©, R (£0 ,p) = - ^ max, m where mi is the number of Ki objects after removing the noise objects, ©i is the compactness which calculated by the minimal number of disjoint groups of objects of class Ki by the metric p. The set of admissible values R(Eo, p) belongs to (0, 1] and can be interpreted in terms of fuzzy logic. A direct correlation is shown between values by the method of Cross Validation and the average number of objects attracted by the target object of the minimum coverage of the training sample. It is concluded that a measure of compactness F(X(k), X) can serve as an indicator of the generalizing ability. This measure is recommended for evaluating the quality of recognition algorithms in the data mining.
Keywords
мера компактности, шумовые объекты, информативные признаки, объекты-эталоны, measure of compactness, noise objects, informative features, objects-standardsAuthors
| Name | Organization | |
| Ignatiev Nikolay A. | National University of Uzbekistan | ignatev@rambler.ru | 
References
 
      The generalizing ability of algorithms by the measure of compactness | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2018. № 42. DOI: 10.17223/19988605/42/5