The Types and Combinatorics of Verbal Markers of Different Emotional Tonalities in Russian-Language Internet Texts
The article aims to present theoretical grounds for the concept of the verbal marker, proposes a typology of such markers and summarizes observations about the impact of verbal marker combinations on the accuracy of the computer classifier designed to assign Internet texts in Russian to different emotional classes of texts. As a result of the complex analysis of the up-to-date information based on the international scholarship, the authors of the article give a definition of the term "verbal marker". The latter is a unit or structure belonging to the one of the linguistic system levels, available to parametrization and appearing in the text as an indicator of processes, covert from direct observation, occurring in human cognitive system. According to the level of the linguistic system in which the unit or the structure with the marking function is localized, the authors propose to distinguish the following types of verbal markers relevant for the analysis of written texts: lexical markers, morphological markers, syntactical markers, semantic markers, punctuation markers and, finally, textual markers. To prove the practical viability of the conception, the authors applied it in their project conducted in the field of sentiment analysis and supposed to resolve the problem of attributing an Internet text in Russian to a particular class of emotions. The authors are deeply interested in the emotional tonality of Internet texts because they became one of the most common forms of texts in Russian, and the technology of their automatic assessment has the clearest commercial and social prospects. The concept of the classifier is based on eight emotions detected by Swedish neuroscientist H. Lovheim in relation to some specific combinations of the levels of monoamines in the limbic system of human brain. To build the classifier, the authors used the method of supervised machine learning which demands the sample selection and the extraction of features. As the data, the authors took 15,000 emotionally rich fragments of 60-80 words selected from the Russian social network VK public Podslushano [Overheard]. For sample extraction, firstly, the authors mapped eight emotional classes of Lovheim's model to a range of hashtags used by public group editors to publish users' posts. Secondly, each text from the sample was assessed by three informants on the crowdsourcing platform. After that, the preliminary classified data went through the expert linguistic analysis made by using multiple tools offered by the linguistic corpus manager Sketch Engine. This analysis led the authors to the extraction of a feature set for the SVM algorithm-based classifier. The analysis of eight texts classes by methods of corpus linguistics and the use of prototype of the classifier showed the dynamics of the weighted average f1-score while incorporating different verbal markers as the classifier features. Thus, the results of the research showed the greatest efficiency of lexical and punctuation markers. However, syntactical and morphological markers also proved to be effective for some classes of emotions. In addition, the authors stress the relevance of marker combinations for accuracy of the statistical models created by the classifier. At present, the f1-score of the classifier in different emotional classes of texts varies from 30% to 50%, which is comparable with the results showed by classifiers built for other languages.
Keywords
эмоция,
интернет-тексты,
сентимент-анализ,
вербальные маркеры,
машинное обучение,
когниция,
emotion,
Internet texts,
sentiment analysis,
verbal markers,
machine learning,
cognitionAuthors
| Kolmogorova Anastasia V. | Siberian Federal University | nastiakol@mail.ru |
| Kalinin A.A. | Siberian Federal University | verbalab@yandex.ru |
| Malikova A.V. | Siberian Federal University | malikovaav1304@gmail.com |
Всего: 3
References
Жаботинская С.А. Имя как текст: концептуальная сеть лексического значения (анализ имени эмоции) // Когниция, коммуникация, дис курс. 2013. № 6. С. 47-76. URL: http://sites.google.com/site/cognitiondiscourse/ (дата обращения: 04.03.2019). DOI: 10.26565/2218-29262013-06-04
Анохин К.В. Когнитом - гиперсетевая модель мозга // Материалы XVII Всероссийской научно-технической конференции Нейроинформатика - 2015. URL: http://neuroinfo.mephi.ru/conf/Content/Presentations/Anokhin2015.pdf (дата обращения: 14.02.2019).
Масевич А.Ц., Захаров В.П. Методы корпусной лингвистики в исторических и культурологических исследованиях // Компьютерная лингвистика и вычислительные онтологии : сб науч. ст. Труды XIX Междунар. объединённой науч. конф. «Интернет и современное общество» (IMS-2016). СПб. : Университет ИТМО, 2016. С. 24-43.
Подлесская В.И., Кибрик А.А. Дискурсивные маркеры в структуре устного рассказа: опыт корпусного исследования // Компьютерная лингвистика и интеллектуальные технологии : по материалам ежегодной Междунар. конф. «Диалог 2009». М. : РГГУ, 2009. Вып. 8 (15). С. 390-395.
Fraser B. What Are Discourse Markers? // Journal of Pragmatics. 1999. 31 (7). P. 931-952.
Knott A., Sanders T. The Classification of Coherence Relations and Their Linguistic Markers: An Exploration of Two Languages // Journal of Pragmatics. 1998. 30 (2). P. 135-175. URL: https://doi.org/10.1016/S0378-2166(98)00023-X
Furko P. The Pragmatic Marker - Discourse Marker Dichotomy Reconsidered: The Case of 'Well' and 'Of Course'. Debrecen : Debrecen Univer sity Press, 2007. 136 р.
Дементьев В.В. Теория речевых жанров. М. : Знак, 2010. 600 с.
Белова Е.В. Речевые маркеры бытового конфликта // Вестник ТвГУ. Сер. Филология. 2017. № 2. С. 157-161.
Потапова Р.К., Потапов В.В. Временные корреляты эмоции как специфические индивидуальные параметры идентификации говорящего в судебной фонетике // Акустика речи и прикладная лингвистика: Ежегодник Российского акустического общества / отв. ред. Р.К. Потапова. М., 2002. Вып. 3. С. 3-13.
Pickering L. et. al. Prosodic Markers of Saliency in Humorous Narratives // Discourse Processes. 2009. 46 (6). P. 516-540.
Зубова И.И. Автоматическая идентификация конфликтной речевой ситуации в письменном тексте // Инновации в науке и практике : сб. ст. по материалам VIII междунар. науч.-практ. конф. 2018. С. 35-42.
Arciuli J., Mallard D., Villar G. "Um, I can tell you're lying": Linguistic markers of deception versus truth-telling in speech // Applied Psycholin-guistic. 2010. Vol. 31. P. 397-411.
Фомин А.Г., Якимова Н.С. Тактики и маркеры вербальной агрессии в коммуникативном поведении россиян и американцев (по материалам речеситуативного исследования) // Сибирский филологический журнал. 2012. № 2. С. 197-207.
Al-Mosaiwi M., Johnstone T. Linguistic markers of moderate and absolute natural language // Personality and Individual Differences. 2018. Vol. 134. P. 119-124. URL: https://doi.org/10.1016/j.paid.2018.06.004
Cohen K. et. al. Detecting Linguistic Markers for Radical Violence in Social Media // Terrorism and Political Violence. 2014. 26 (1). P. 246-256.
Колосов Я.В. Лингвистические корреляты эмоционального состояния «страх» в русской и английской речи: формирование базы данных : дис.. канд. филол. наук. М., 2004. 214 c.
Al-Mosaiwi M., Johnstone T. In an Absolute State: Elevated Use of Absolutist Words Is a Marker Specific to Anxiety, Depression, and Suicidal Ideation // Clinical Psychological Science. 2018. Vol. 6, is. 4. P. 529-542. URL: https://doi.org/10.1177/2167702617747074
Колмогорова А.В., Горностаева Ю.А., Калинин А.А. Разработка компьютерной программы автоматического анализа и классификации поляризованных политических текстов на английском языке по уровню их манипулятивного воздействия: практические результаты и обсуждение // Политическая лингвистика. 2017. № 4 (64). С. 67-75.
Raza M.S., Qamar U. Understanding and Using Rough Set Based Feature Selection: Concepts, Techniques and Applications. Singapore : Springer, 2017. 194 p.
Сарбасова А.Н. Исследование методов сентимент-анализа русскоязычных текстов // Молодой ученый. 2015. № 8. С. 143-146.
Das S., Chen M. Yahoo! for Amazon: Extracting market sentiment from stock message boards // Proceedings of the Asia Pacific Finance Association Annual Conference (APFA). 2001. P. 1-16.
Pang B., Lee L., Vaithyanathan Sh. Thumbs up? Sentiment classification using machine learning techniques // Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 2002. P. 79-86.
Hogenboom A., Frasincar F., Jong F., Kaymak U. Polarity Classification Using Structure-Based Vector Representations of Text // Decison Support Systems. 2015. Vol. 74. P. 46-56.
Banea C., Mihalcea R., Wiebe J., Hassan S. Multilingual Subjectivity Analysis Using Machine Translation // Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. 2008. P. 127-135.
Lucas G.M., Gratch J., Malandrakis N., Szablowski E., Fessler E., Nichols J. GOAALLL!: Using Sentiment in the World Cup to Explore Theories of Emotion // Image and Vision Computing. 2017. P. 58-65. doi:10.1016/j.imavis.2017.01.006
Staiano J., Guerini M. DepecheMood: A Lexicon for Emotion Analysis from Crowd-Annotated News // Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers). Baltimore, Maryland : Association for Computational Linguistics, 2014. P. 427-433.
Loukachevitch N.V., Blinov P.D., Kotelnikov E.V., Rubtsova Y.V., Ivanov V.V., Tutubalina E.V. SentiRuEval: Testing Object-Oriented Sentiment Analysis Systems in Russian // Computational Linguistics and Intellectual Technologies: Proceedings of the Annual International Conference "Dialogue 2015". Moscow, 2015. Vol. 14 (2). P. 3-15.
Loukachevitch N.V., Rubtsova Y.V. SentiRuEval-2016: Overcoming Time Gap and Data Sparsity in Tweet Sentiment Analysis // Computational Linguistics and Intellectual Technologies: Proceedings of the Annual International Conference "Dialogue 2016". Moscow, 2016. Vol. 15. P. 416-426.
Alm C.O., Rot D., Sproat R. Emotions from Text: Machine Learning for Text-based Emotion Prediction // Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. Vancouver, 2005. P. 579-586.
Thelwall M., Buckley K., Paltoglou G., Cai D. Sentiment Strength Detection in Short Informal Text // Journal of the American Society for Information Science and Technology. 2010. Vol. 61 (12). P. 2544-2558.
Socher R., Perelygin A., Wu J.Y., Chuang J., Maning Ch. Ng A.Y., Potts Ch. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank // Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2013. P. 1631-1642.
Chaffar S., Inkpen D. Using a Heterogeneous Dataset for Emotion Analysis in Text // Canadian AI 2011: Advances in Artificial Intelligence. Lecture Notes in Computer Science. Berlin, Heidelberg : Springer, 2011. 6657. P. 62-67.
Lovheim H. A new three-dimensional model for emotions and monoamine neurotransmitters // Medical Hypotheses. 2012. № 78. Р. 341-348.
Пожидаева Е.В., Карамалак О.А. Хэштеги в социальных сетях: интенции и аффордансы (на примере группы сообщений на английском языке по теме «Food» (Пища / еда)) // Вестник Томского государственного университета. Филология. 2018. № 55. C. 106-118. DOI: 10.17223/19986645/55/8
Захаров В.П., Богданова С.Ю. Корпусная лингвистика. Иркутск : ИГЛУ, 2011. 161 с.