Using TXM Platform for Research on Language Changes over Time: the Dynamics of Vocabulary and Punctuation in Russian Literary Texts | Vestnik Tomskogo gosudarstvennogo universiteta. Filologiya – Tomsk State University Journal of Philology. 2021. № 70. DOI: 10.17223/19986645/70/5

Using TXM Platform for Research on Language Changes over Time: the Dynamics of Vocabulary and Punctuation in Russian Literary Texts

The aim of this article is to test the methodological tools provided by TXM open-source software for research on dynamics of vocabulary and punctuation marks in diachronic corpora. TXM provides both quantitative and qualitative analysis features. It is shown that Russian revolution of 1917 did make significant changes in the core vocabulary of the corpus of Russian Short Stories (1901-1930). The same methodology may be used both for diachronic studies of literature and for various NLP tasks.

Download file
Counter downloads: 316

Keywords

stylometry, textometry, TXM platform, corpus linguistics, Russian literature of 20th century, vocabulary, punctuation, diachronic linguistics

Authors

NameOrganizationE-mail
Lavrentiev Alexey M.French National Centre for Scientific Researchalexei.lavrentev@ens-lyon.fr
Sherstinova Tatiana Yu.Higher School of Economics; Saint-Petersburg State Universitytsherstinova@hse.ru
Chepovskiy Andrey M.Higher School of Economics; Peoples' Friendship University of Russia (RUDN)achepovskiy@hse.ru
Pincemin BenedictFrench National Centre for Scientific Researchbenedicte.pincemin@ens-lyon.fr
Всего: 4

References

Martynenko, G.Ya. (1988) Osnovy stilemetrii [The Foundation of Stylometics]. St. Petersburg: St. Petersburg State University.
Martynenko, G.Ya. (2019) Metody matematicheskoy lingvistiki v stilisticheskikh issle-dovaniyakh [Methods of mathematical linguistics in stylistic studies]. St. Petersburg: Nestor-Istoriya.
Martynenko, G., Sherstinova, T. (2020) Linguistic and Stylistic Parameters for the Study of Literary Language in the Corpus of Russian Short Stories of the First Third of the 20th Century. In: R. Piotrowski's Readings in Language Engineering and Applied Linguistics, Proceedings of PRLEAL-2019. Saint Petersburg, Russia. November 27, 2019. CEUR Workshop Proceedings. Vol. 2552. pp. 105-120. [Online] Available from: http://ceur-ws.org/Vol-2552/Paper10. pdf.
Tukey, J.W. (1980) We Need Both Exploratory and Confirmatory. The American Statistician. 34 (1). pp. 23-25.
Ludeling, A. & Kyto, M. (ed.) (2009) Corpus linguistics: an international handbook. Vol. 2. Berlin; New York: W. de Gruyter.
Osipov, G.S. (2011) Metodi iskusstvennogo intellekta [Methods of Artificial Intelligence]. Moscow: Fizmatlit.
Chepovskiy, A.M. (2015). Informatsionnyye modeli v zadachakh obrabotki tekstov na estestvennykh yazykakh [Information Models for the Problems of Natural Text Processing]. 2nd ed. Moscow: Natsional’nyy otkrytyy universitet “INTUIT”.
Lavrentiev, A.M. et al. (2018) A new toolkit for natural text processing with the TXM platform and its application to a corpus for analysis of texts propagating extremist views. Vestnik NSU. Series: Linguistics and Intercultural Communication. 16 (3). pp. 19-31. (In Russian). DOI: 10.25205/1818-7935-2018-16-3-19-31
Lavrentiev, A.M. et al. (2018) Creating text corpora for special purposes on the basis of extended TXM platform. Sistemy vysokoy dostupnosti. 14 (3). pp. 76-81. (In Russian). DOI: 10.18127/j20729472-201803-13
Polyakov, I.V. et al. (2015) The problem of text classification and differentiating features. Vestnik NSU. Series: Information Technologies. 13 (2). pp. 55-63. (In Russian).
Martynenko, G.Ya. & Sherstinova, T.Yu. (2020) Corpus of Russian Short Stories of the First Third of the 20th Century: Theoretical Issues and Linguistic Parameters. Structural and Applied Linguistics. 14. (In Russian). (in print).
Martynenko, G. et al. (2018) On the principles of creation of the Russian short stories corpus of the first third of the 20th century. Proceedings of the XV International Conference on Computer and Cognitive Linguistics ‘TEL 2018’. Kazan. pp. 180-197. (In Russian).
Sherstinova, T. et al. (2020) Frequency Word Lists and Their Variability (the Case of Russian Fiction in 1900-1930). Proceedings of the 27th Conference of Open Innovations Association FRUCT. Trento: University of Trento, Italy. (in print).
Martynenko, G. & Sherstinova, T. (2019) Symmetrics of syntactic figures in fiction: the case of Russian short stories of the 20th century. Computer Linguistics and Computing Ontologies. 3. pp. 116-123. DOI: 10.17586/2541-9781-2019-3-116-123
Kazartsev, E., Davydova, A. & Sherstinova, T. (2020) Rhythmic Structures of Russian Prose and Occasional Iambs (a Diachronic Case Study). Proceedings of the 22nd International Conference on Speech and Computer - SpeCom 2020, St. Petersbug, LNCS (LNAI). Springer International Publishing. (in print).
Sherstinova, T. & Skrebtsova, T. (2020) Russian Literature around the October Revolution: A Quantitative Exploratory Study of Literary Themes and Narrative Structure in Russian Short Stories of 1900-1930. In: Proceedings of the International Workshop "Computational Linguistics” - CompLing-2020. St. Petersburg. (in print).
Martynenko, G. & Sherstinova, T. (2019) Analytical Distribution Model for Syntactic Variables Average Values in Russian literary Texts. Proceedings of the 4th International Conference Digital Transformation and Global Society DTGS-2019, St. Petersburg, Russia, June 19-21, 2019. Revised Selected Papers. Communications in Computer and Information Science. 1038. Springer International Publishing. pp. 719-731.
Sherstinova, T., Ushakova, E. & Melnik, A. (2020) Measures of Syntactic Complexity and their Change over Time (the Case of Russian). Proceedings of the 27th Conference of Open Innovations Association FRUCT. Trento: University of Trento, Italy. (in print).
Sherstinova, T. & Kirina, M. (2020) Data Normalization in the Corpus of Russian Short Stories: Spelling, Literary Themes and Biographical Description of Writers (under review).
Savchuk, S. O. (2009) Korpus tekstov pervoy poloviny XX veka: tekuschee sostojanie i perspektivy [Text Corpus of the First Half of the 20th Century: Current State and Prospects]. In: Natsional’nyy korpus russkogo yazyka: 2006-2008. Novye rezul’taty i perspektivy [Russian National Corpus: 2006-2008. New Results and Prospects]. Saint Petersburg: Nestor-Istoriya. pp. 27-45.
Heiden, S. (2010) The TXM Platform: Building Open-Source Textual Analysis Software Compatible with the TEI Encoding Scheme. In: Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation. Sendai, Japan. pp. 389-398. [Online] Available from: https://halshs.archives-ouvertes.fr/halshs-00549764.
TXM public website. [Online] Available from: http://textometrie.org.
GWTProject website. [Online] Available from: http://www.gwtproject.org.
Schmid, H. (1994) Probabilistic Part-of-Speech Tagging Using Decision Trees. Proceedings International Conference on New Methods in Language Processing, Manchester, UK, Sept. 1994. pp. 44-49.
IMS Open Corpus Workbench (CWB). [Online] Available from: http://cwb.sourceforge.net.
The R Project for Statistical Computing. [Online] Available from: https://www.r-project.org.
Benzecri, J.-P. et al. (1973) L’analyse des donnees. T. 2: L’analyse des Correspondances. Paris: Dunod.
Husson, F., Le, S. & Pages, J. (2017) Exploratory Multivariate Analysis by Example Using R. 2nd ed. Boca Raton: Chapman and Hall/CRC.
Leon, J. & Loiseau, S. (eds) (2016) History of Quantitative Linguistics in France. Ludenscheid: RAM-Verlag.
Nee, E. et al. (2017) Methodes et Outils Informatiques pour I’Analyse des Discours. Rennes: Presses Universitaires.
Lexicometrica website. [Online] Available from: http://jadt.org.
Lebart, L., Salem, A. & Berry, L. (1998) Exploring Textual Data. Dordrecht: Kluwer Academic.
Lebart, L., Pincemin, B. & Poudat, C. (2019) Analyse des Donnees Textuelles. Quebec: Presses de l’universite du Quebec.
Guttman, L. (1941) The quantification of a class of attributes: A theory and method of a scale construction. In: The prediction of personal adjustment. New York: SSCR. pp. 251-264.
Salem, A. (1991) Les series textuelles chronologiques. Histoire et Mesure. 6 (1). pp. 149-175.
Lafon, P. (1980) Sur la variability de la frequence des formes dans un corpus. Mots. Les langages du politique. 1. pp. 127-165.
Fisher, R.A. (1935) The Design of Experiments. Edinburg: Oliver and Boyd.
McEnery, T. & Hardie, A. (2012) Corpus Linguistics: Method, Theory and Practice. Cambridge University Press.
Sharoff, S. (2012) Russian statistical taggers and parsers. [Online] Available from: http: // corpus.leeds.ac.uk/mocky.
Biber, D. (2014) The ubiquitous oral versus literate dimension: A survey of multidimensional studies. In: Connor-Linton, J. & Amoroso, L.W. (eds) Measured language: Quantitative studies of acquisition, assessment, and variation, Washington DC: Georgetown University Press. pp. 1-20.
Brunet, E. (2016). Tous comptes faits, Ecrits choisis. T. 3: Questions linguistiques. Paris: Champion.
Sherstinova, T.Yu. (2019) Biographical database of Russian writers (on the creation of a corpus of Russian narrative of the 20th century). Proceedings of the International Conference "Corpus Linguistics-2019”. St. Petersburg: St. Petersburg University. pp. 439-447.
Sherstinova, T.Yu. & Skrebtsova, T.G. (2020) Russian Literature Around the October Revolution: A Quantitative Exploratory Study of Literary Themes and Narrative Structure in Russian Short Stories of 1900-1930. Proceedings of the International Conference DTGS-2020. Digital Transformation and Global Society. 5th International Conference, DTGS 2020, St. Petersburg, Russia, 2020, Revised Selected Papers. Communications in Computer and Information Science. (in print).
Skrebtsova, T.G. (2019) [Narrative structure of the Russian short story in the early 20th century]. Proceedings of the International Conference "Corpus Linguistics - 2019”. St. Petersburg: St. Petersburg University. pp. 426-431. (In Russian).
Martynenko, G.Ya. (2019) [Stylized syntactic triads in Russian short story of the first third of the 20th century]. Proceedings of the International Conference "Corpus Linguistics -2019”. St. Petersburg State University. pp. 395-404. (In Russian).
 Using TXM Platform for Research on Language Changes over Time: the Dynamics of Vocabulary and Punctuation in Russian Literary Texts | Vestnik Tomskogo gosudarstvennogo universiteta. Filologiya – Tomsk State University Journal of Philology. 2021. № 70. DOI: 10.17223/19986645/70/5

Using TXM Platform for Research on Language Changes over Time: the Dynamics of Vocabulary and Punctuation in Russian Literary Texts | Vestnik Tomskogo gosudarstvennogo universiteta. Filologiya – Tomsk State University Journal of Philology. 2021. № 70. DOI: 10.17223/19986645/70/5

Download full-text version
Counter downloads: 1792