Formal analysis of order in the local structure of the nucleotide sequences | Tomsk State University Journal of Control and Computer Science. 2014. № 4(29).

Formal analysis of order in the local structure of the nucleotide sequences

The definition of the chain order and integral characteristics of the order, in particular, for nucleotide sequences were presented in the previous papers. These characteristics showed the high sensitivity to the arrangement of components. The possibility of comparison, classification and hashing based on the introduced formalisms and using characteristics to order have been considered. The approach developed by the authors allows displaying the local structure of sign sequences of arbitrary nature by numerical sequences that represent the arrangement of their components. The generally accepted method of studying large arrays of measurement data, linguistic texts, nucleotide sequences, and long sequences of another nature is the «window scan». This paper describes means for analysis of the local structure of complete full-length sequences based on the characteristics of the order of separate fragments (L-grams), named the functions of characteristics of order. The formulas for calculation of the values of these functions are of the following form: Ду = Xj^^y ^ij; %i+ij' £ [5 * k,s * k + m i fcik'l'S = ^^(о^Ду' y=i ;=i f g(k'l'S)= ' ' / l, , U > /Д B(k'l,s)= |ППД'' л) J f r(k,l,s)=fb g(k'l'S)/f D(k'l'S) ' where x^ is a number of position of i-th occurrence of j-th element of alphabet on position of current fragment; k is a number of fragment; s is a step size (when s=1 fragments become L-grams); l is a window length; f G(kJ'S) is a depth function; f a(kJ'S) is an average remoteness function; f Ag(k, l,s) is an average geometric interval function; f r(k, l,s) is a regularity function; f D(k, l,s) is a descriptive information function. A larger window allows detecting fragments with similar order of greater length. Increasing the length of the fragments results in function values provides tending to a value of corresponding integral characteristic of the full length sequence. Reducing the length of the fragments allows using separate functions values for detection of more detailed features of the arrangement of components within the window. Preliminary studies showed that the relationship between the window length and the dispersion of the characteristic values is hyperbolic. However, if the window length is reduced to the cardinality of the alphabet (m = 4), this dependence is violated. Thus, as expected, the uncertainty of the location of the fragment is associated with the uncertainty of the function values obtained for a given window length. Selection of an optimal window length for various tasks, including, dependence on the cardinality of the alphabet of the original sequence, requires additional research. Software for calculating and displaying the functions of characteristics of order is developed and tested on ribosomal RNA of several organisms. Research revealed the influence of fragment (L-gram) length on the shape of functions of characteristics of order. The possibility of using the functions of the characteristics of order for finding similar or overlapping fragments in one or more sequences is considered, as well as - the inverse problem - finding of occurrences of the specified fragments in the complete genome sequence. Displaying the order of nucleotide sequences with functions, besides noted means also allows using the classical methods of mathematics, such as: mathematical analysis, spectral analysis, correlation analysis, etc., that would be impossible with the direct analysis of symbolic sequences themselves. It is noted that the graphical representation of functions of characteristics of order allows carrying expert analysis of long nucleotide sequences, including complete genome sequences.

Download file
Counter downloads: 401

Keywords

строй цепи, нуклеотидная последовательность, характеристики строя, функции характеристик строя, L-граммы, локальная структура нуклеотидной цепи, chain's order, sequence, nucleotide sequence, order characteristics, functions of order characteristics, L-grams, local structure of nucleotide chain

Authors

NameOrganizationE-mail
Gumenyuk Alexander S.Omsk State Technical Universitygumas45@mail.ru
Pozdnichenko Nikolay N.Omsk State Technical Universitynick670@yandex.ru
Shpynov Stanislav N.Gamaleya Institute of Epidemiology and Microbiologystan63@inbox.ru
Всего: 3

References

Гуменюк А.С., Морозенко Е.В., Родионов И.Н. Формализация анализа строя знаковых цепей // Вестник Томского государ ственного университета. Управление, вычислительная техника и информатика. 2011. № 2(15). С. 15-23.
Gumenyuk A., Kostyshin A., Simonova S. An approach to the research of the structure of linguistic and musical texts // Glottometrics. 2002. No. 3. С. 61-69.
Гуменюк А.С. О средствах анализа взаимного расположения компонентов знаковой последовательности // Военная техника, вооружение и технологии двойного применения : материалы III Междунар. технолог. конгр. Омск : ОмГТУ, 2005. Ч. 2. С. 48-52.
Гуменюк А.С., Поздниченко Н.Н., Шпынов С.Н., Родионов И.Н. О средствах формального анализа строя нуклеотидных це пей // Математическая биология и биоинформатика. 2013. Т. 8, № 1. С. 373-397. URL: http://www.matbio.org/arti-cle.php?journ_id=15&id=158
Гуменюк А. С., Поздниченко Н.Н. Анализ строя нуклеотидных последовательностей // Материалы Всероссийской конферен ции с международным участием «Знания - онтологии - теории» (ЗОНТ-2013) 8-10 октября 2013 года. Новосибирск, 2013. Т. 2. С. 58-68.
МазурМ. Качественная теория информации. М. : Мир, 1974. 240 с.
Садовский М.Г. Информационно-статистический анализ нуклеотидных последовательностей : дис.. д-ра физ.-мат. наук. Красноярск, 2004. 394 с.
Гуменюк А.С. О формализмах конструирования абстрактных объектов во внутреннем физическом пространстве информаци онной системы (Элементы алгебры ментальных событий) // Системный анализ в проектировании и управлении: труды X Междунар. науч.-практ. конф. СПб. : Изд-во Политех. ун.-та, 2006. Ч. 2. С. 172-181.
National Center for Biotechnology Information. URL: http://www.ncbi.nlm.nih.gov/nuccore/
Цымбал В.С., Поздниченко Н.Н. О разработке модуля для вычисления локальных характеристик строя нуклеотидных последовательностей // Материалы XII Всероссийской научно-практической конференции с международным участием «Информационные технологии и математическое моделирование (ИТММ-2013)». Томск : Изд-во Том. ун-та, 2013. С. 61-65.
 Formal analysis of order in the local structure of the nucleotide sequences | Tomsk State University Journal of Control and Computer Science. 2014. №  4(29).

Formal analysis of order in the local structure of the nucleotide sequences | Tomsk State University Journal of Control and Computer Science. 2014. № 4(29).

Download full-text version
Counter downloads: 1041