The Multi-Parameter Analysis of Linguistic Data in the Information System Semograf (On the Example of the Study of Social Network Users' Speech)
The aim of this article is to demonstrate the capabilities of the information system Semograph (http://semograph.org) as a tool for text content analysis when implementing a network approach to the organization of scientific research in linguistics. Semograph can be used for the analysis of text data, creation and/or annotation of language/text corpora, conducting, processing and analysis of psycholinguistic and sociolinguistic experiments, development of classifiers and thesauri, and solving other problems that arise when analyzing language material. Semograph implements the principles of a full research cycle, network distribution of research participants, a multi-user mode of operation and methodological pluralism. The possibilities of network organization of work in Semograph are shown on the example of a multiparametric analysis of speech behavior, social parameters and psychological characteristics of users of the social network VKontakte. The total volume of the automatically collected material is 18,126 utterances of 340 users who have completed a psychological survey of BFI, according to which results of the severity of the five psychological personal traits (extraversion vs. introversion, agreeableness vs. antagonism, conscientiousness vs. lack of direction, neuroticism vs. emotional stability, openness vs. closedness to experience) are determined. For the analysis of the text material, a multi-level hierarchical classifier was developed that allows each expert-linguist to create and develop a separate classification branch (thus, the same material is considered by different experts from different points of view, and its multiparametric linguistic classification is created). This classification and specific user metadata (gender, psychological characteristics, etc.) provide the basis for constructing a model of interrelations between linguistic parameters of speech and socio-psychological characteristics of a person by means of interactive visual analytics. The article demonstrates these interrelations on the example of differences in the use of role and spatial deixis tools by extroverts and introverts, abusive and obscene lexical unites by users with a strong tendency for closedness and openness to experince, etc. The resulting model shows that the speech variability of texts is due to the interaction of psychological and gender characteristics of the informants, rather than a single act of these factors. In general, the article demonstrates that the information system Semograph allows, on the one hand, analyzing large arrays of texts with linguistic and extra-linguistic annotations, on the other hand, applying a network model of research organization that in the aggregate gives advantages in constructing models of fragments of linguistic and sociocultural reality.
Keywords
semantic graph modeling, visual analytics, multiparameter analysis, information system Semograph, social network-services, network science, графосемантическое моделирование, визуальная аналитика, многопараметрический анализ, информационная система «Семограф», социальные интернет-сервисы, сетевая наукаAuthors
Name | Organization | |
Belousov Konslanlin I. | Perm State University | belousovki@gmail.com |
Erofeeva Elena V. | Perm State University | elener-ofee@gmail.com |
Baranov Dmilriy A. | Perm State University | baranov@semograph.com |
Zelyanskaya Nalalya L. | Perm State University | zelyanskaya@gmail.com |
Shchebetenko Sergei A. | Higher School of Economics | shebetenko@rambler.ru |
References

The Multi-Parameter Analysis of Linguistic Data in the Information System Semograf (On the Example of the Study of Social Network Users' Speech) | Vestnik Tomskogo gosudarstvennogo universiteta. Filologiya – Tomsk State University Journal of Philology. 2020. № 64. DOI: 10.17223/19986645/64/1