Serial and complex description of the order of components in data sets | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2018. № 42. DOI: 10.17223/19988605/42/4

Serial and complex description of the order of components in data sets

The paper describes the concept of «order of data sequence», which is defined as a special kind of tuple - «vector of order». The components of order are integer numbers i that are not more than its length n; first encountered different numbers j < m < n are increasing by one. The works representing the means of formal description and analysis of order of data sequences, considered such long tuples (symbolic sequences, data sets), in which separate different components throughout the chain nearly always alternate, and the series (of the same elements arranged in row) are rare and short. Computer processing of large «text» data sets (prose, poems, musical compositions, nucleotide sequences) showed high sensitivity of characteristics of order to the arrangement of components (letters, words, notes, etc.) in long and very long tuples. The proposed means suggest representation of symbolic sequence with its order. The result of decomposition of order is congeneric sequences in each of which places occupied by similar elements (in amount of nj) are marked with integer number and all other positions are empty. In these congeneric sequences interval between the nearest occupied positions is used as basic measure and calculated as Aj - number of empty positions plus one. This paper discusses means for analysis of ordered data sets, which are mainly represented by alternating series of identical messages. This may be, for example, digitized images, sequences of measured values, etc. These means are represented by a set of «serial» characteristics of order, which are defined, named, marked in a similar way to the previously introduced «interval» characteristics of order. Series length is used as basic measured value and its size is calculated as a number of occupied places in a row in the j-th congeneric sequence. The length of series is marked Tj (i is the number of element in the n-th partial congeneric sequence). Below, some of the serial characteristics of the system are given. Serial volume of the j'-th congeneric sequence is defined as (1), and serial volume of the complete sequence is defined as (2): у*, =m , (1) V =п "Л- (2) The total spread of all series of the '-th congeneric sequence is defined as (3), and the total spread of all the series in the complete sequence is defined as (4): Lj = nj · log2 Tgj , (3) L = n • log2 xg . (4) where % is the geometric mean length of the series in the j'-th congeneric seqience; Tg is the geometric mean length of the series in the complete sequence; lj = log2% is the average spread of the series in the j'-th congeneric sequence; l = log2Tg is the average spread of the series in the complete sequence. The «complete» serial description of an order is defined by the following distributions: {Lj}, {}. The order of sequence containing, in addition to «sparse» data also a series of elements described by complexes of numerical characteristics and their distributions, taking into account both intervals Ац and remoteness gj = log2Aj and length of series Tj and their spread li' = log2Ti'. Description of order of such sequence also requires preliminary decomposition into congeneric sequences. It is possible to carry out a separate interval and a serial description of an order. The concept of capacity of i-th message vj = Aj Tj is introduced. Based on the capacity of messages set of «capacitive» characteristics of an order is defined, among which are following: У = УА • V , Ug =Ag ^ , С = G + L , c = g +1, where V and Va are complete and interval volume of order; G is the total remoteness (depth) of order; Ag is the geometric mean interval and g is the average remoteness of messages in a complete sequence; C = log2 V is the complete size of the data array; c = log2Ug is the average size of messages in the data array. Developed tools and a large set of numerical characteristics reflect a variety of properties of the new abstract object - the order of information chain.

Download file
Counter downloads: 245

Keywords

строй цепи, числовые характеристики строя, удаленность сообщения, протяженность серии, межнуклеотидное расстояние, order, order characteristics, remoteness, spread of series, inter-nucleotide distance

Authors

NameOrganizationE-mail
Gumenyuk Alexander S.Omsk State Technical Universitygumas45@mail.ru
Всего: 1

References

Gumenyuk A., Kostyshin A., Simonova S. An approach to the research of the structure of linguistic and musical texts // Glottometrics. 2002. No. 3. P. 61-69.
Гуменюк А.С., Поздниченко Н.Н., Шпынов С.Н., Родионов И.Н. О средствах формального анализа строя нуклеотидных цепей // Математическая биология и биоинформатика. 2013. Т. 8, № 1. С. 373-397. URL: http://www.matbio.org/article.php?journ_id= 15&id=158 (дата обращения: 15.04.2016).
Вентцель Е.С. Исследование операций: задачи, принципы, методология. М. : Наука, 1988. 208 с.
Nair A.S., Mahalakshmi T. Visualization of genomic data using inter-nucleotide distance signals // Proc. of IEEE Genomic Signal Processing. Bucharest, 2005. URL:http://www.ece.iit.edu/~biitcomm/research/references/Achuthsankar%20S%20Nair/Visualization %20of%20genomic%20data%20using%20inter-nucleotide%20distance%20signals.pdf (access date: 21.12.17).
Afreixo V., Bastos C.A.C., Pinho A.J., Garcia S.P., Ferreira P.J.S.G. Genome analysis with inter-nucleotide distances // Bioinformatics. 2009. V. 25 (23). P. 3064-3070.
Мазур М. Качественная теория информации. М. : Мир, 1974. 240 с.
 Serial and complex description of the order of components in data sets | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2018. № 42. DOI: 10.17223/19988605/42/4

Serial and complex description of the order of components in data sets | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2018. № 42. DOI: 10.17223/19988605/42/4

Download full-text version
Counter downloads: 847