Map of genes - new tool for representation of a single-chromosome genomes and their components | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2018. № 43. DOI: 10.17223/19988605/43/7

Map of genes - new tool for representation of a single-chromosome genomes and their components

Previous works presented new approach - formal order analysis (FOA) based on Mazur's information theory and allowing describing and analyzing ordered data arrays of various nature (information chains). This approach directly takes into account arrangement of elements in sequences. Connections between elements of order (individual informations) are calculated as intervals between nearest similar elements (for nucleotide sequences this is inter-nucleotide distances). Multiplication of intervals gives number of descriptive informations. Binary logarithm of this value gives number of identifying informations. Characteristics calculated this way represents arrangement of elements in the whole object. Previously, order characteristics have been used in the study of genetic sequence for the following purposes: classification of prokaryotes on levels of species, genus and family; classification of organisms at higher taxonomic levels; determination of the similarity of genetic sequences by comparison of the characteristics of distributions of congeneric sequences and using corresponding matrices; study of the local structure of the nucleotide sequences; search sequence fragments with the same order, etc. The logical development of representation of the nucleotide sequences using the characteristics of order was the idea of a "mapping" of genomes and their components. In this case, the characteristic of whole genomes is plotted along the x axis, and along the y axis characteristic of their components is plotted; dot represents separate component of the genome. Cartographic representation of a set of organisms allows for expert analysis in order to search for the similarities of the individual components and consequently in the whole genomes. Currently GenBank is the largest library of nucleotide sequences and in particular the whole genomes. Authors of sequences representing the complete genomes, upon upload of the sequence can give and its annotation or use the automatic annotation tool provided by GenBank. Such annotation includes information about the "location" of the different components in the genome. Two annotations presented for most genomes: one uploaded by authors and another automatically executed by GenBank annotation tool. Unfortunately, different annotations of one sequence can differ considerably, making it difficult to study and compare organisms by their components. Annotations presented in the GenBank are semi-structured and not adapted for completely automatic processing. In particular, each coding region is marked twice: as a CDS (coding sequence), and as a gene, while, for example, rRNA marked only once. Due to imperfections in both automatic and manual annotations, for many components their exact position and length are unknown and annotations of similar genomes are often marking very different lists of components, which also complicates the comparison of organisms using existing annotations. Cartography of genome's components allows using of order characteristics as for comparison of components within the same genome, as within several genomes of closely related organisms. This approach is relevant for the detection and identification of (unnamed) coding regions and other important components. Another application of this approach can be definition of the functional and structural purpose of coding regions of the genome. This software allows one to filter, sort, and compare the sample of genomes and plasmids components into different groups.

Download file
Counter downloads: 241

Keywords

формальный анализ строя, межнуклеотидное расстояние, карта генов, хеширование характеристиками строя, formal order analysis, inter-nucleotide distance, genes map, hashing with order characteristics

Authors

NameOrganizationE-mail
Pozdnichenko Nikolai N.Omsk State Technical Universitynick670@yandex.ru
Gumenyuk Alexander S.Omsk State Technical Universitygumas45@mail.ru
Shpynov Stanislav N.N.F. Gamaleya FRCEMstan63@inbox.ru
Всего: 3

References

The DDBJ/ENA/GenBank Feature Table Definition. URL: http://www.insdc.org/files/feature_table.html_(access date: 15.04.2016).
GenBank Flat File Format. URL: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html (access date: 15.04.2016).
NCBI Prokaryotic Genome Annotation Pipeline. URL: http://www.ncbi.nlm.nih.gov/genome/annotation_prok/ (access date: 15.04.2016).
Brinza D. et al. RAPID detection of gene-gene interactions in genome-wide association studies // Bioinformatics. 2010. V. 26 (22). P. 2856-2862. DOI:10.1093/bioinformatics/btq529.
Buldas A., Kroonmaa A., Laanoja R. Keyless Signatures' Infrastructure: How to Build Global Distributed Hash-Trees // Secure IT Systems. NordSec 2013 / N.H. Riis, D. Gollmann (eds.) // Lecture Notes in Computer Science. Berlin ; Heidelberg : Springer, 2013. V. 8208. P. 313-320.
Shpynov S., Pozdnichenko N., Gumenuk A. Application of Formal Order Analysis (FOA) to Higher Order Grouping of Bacteria in the Genera Rickettsia and Orientia // Microbes and Infection. 2015. V. 17. P. 839-844.
Indyk P., Motwani R. Approximate nearest neighbors: towards removing the curse of dimensionality // Proc. of 30th STOC'98 Proceedings of the thirtieth annual ACM symposium on Theory of computing. 1998. P. 604-613. DOI: 10.1145/276698.276876.
Nair A.S.S., Mahalakshmi T. Visualization of genomic data using inter-nucleotide distance signals // Proceedings of IEEE Genomic Signal Processing. Bucharest, 2005. Р. 11-13.
Afreixo V., Bastos C.A.C., Pinho A.J., Garcia S.P., Ferreira P.J.S.G. Genome analysis with inter-nucleotide distances. Bioinformatics. 2009. V. 25 (23). P. 3064-3070.
Гуменюк А.С., Поздниченко Н.Н., Шпынов С.Н., Родионов И.Н. О средствах формального анализа строя нуклеотидных цепей // Математическая биология и биоинформатика. 2013. Т. 8, № 1. С. 373-397. URL: http://www.matbio.org/article.php? journ_id=15&id=158 (дата обращения: 15.04.2016).
Gumenyuk A., Kostyshin A., Simonova S. An approach to the research of the structure of linguistic and musical texts // Glottometrics. 2002. No. 3. С. 61-69.
 Map of genes - new tool for representation of a single-chromosome genomes and their components | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2018. № 43. DOI:  10.17223/19988605/43/7

Map of genes - new tool for representation of a single-chromosome genomes and their components | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2018. № 43. DOI: 10.17223/19988605/43/7

Download full-text version
Counter downloads: 604