Dataset as a form of a dictionary in the digital age (as based on the example of a multimodal emotion dataset)
The article substantiates the thesis that in modern technological society, the traditional linguistic dictionary has acquired a new systemic variant - the dataset. While sharing a common “object-key” principle logic, dictionaries and datasets also possess certain differences, which we illustrate using the example of a multimodal emotion dataset. It is designed for studying emotional speech in Russian and assessing the quality of automatic emotion detection across various modalities using computer models. The article aims to demonstrate the potential of datasets as a new form of systematizing and manifesting linguists’ expert knowledge in the digital era. The corpus comprises 173 minutes of video recordings of emotional narratives collected using the autobiographical MIP method with the participation of eleven women aged 1926. The recorded emotional videos were divided into 909 fragments. Each was annotated on six emotional scales (joy, sadness, anger, surprise, fear, disgust) on a 0-5 scale by six annotators (three annotators worked with one half of the sample, three with the other half) in four formats: multimodal and separate audio, text, and video fragments. The key findings from the dataset analysis are as follows. (1) When comparing modalities, the highest inter-annotator agreement scores were observed for text annotations and full multimodal annotations (a = 0.57), while the lowest was for video-only annotations (a = 0.30). (2) When comparing annotator consistency metrics across emotional classes, the highest agreement was found in assessing neutral texts; agreement was relatively high for joyful and sad texts, while mixed emotions were recognized least consistently. (3) Joy and surprise are primarily recognized when fragments are presented in audio format; sadness, fear and disgust are better identified in audio and text modalities, while anger is most accurately recognized only in text modality. (4) Presenting fragments in video format reduces recognition accuracy for all emotions, with the least impact on joy and the greatest on fear. The dataset has also proven effective as a tool for evaluating eight computer emotion recognition models, including text, audio and multimodal models. The highest alignment with human annotations was shown by text-based models, while the worst results came from video-based models. Despite some limitations related to data collection and speech segmentation, the dataset represents a valuable linguistic resource for emotion recognition research. The authors declare no conflicts of interests.
Keywords
multimodality,
dataset,
dictionary,
automatic emotion analysis,
multimodal markup,
speech,
Russian language,
digital linguisticsAuthors
Kolmogorova Anastasia V. | HSE University | akolmogorova@hse.ru |
Kulikova Elizaveta R. | HSE University | Kulikova.E.R@hse.ru |
Всего: 2
References
Шаховский В.И. Обоснование лингвистической теории эмоций // Вопросы психолингвистики. 2019. № 1. С. 22-37.
Бабенко Л.Г. Лингвопсихология на методологической базе когнитивистики: лексикографический аспект // Изв. Урал. федер. ун-та. Сер. 2: Гуманитар. науки. 2020. Т. 22, № 3 (200). С. 264-278.
Бабенко Л.Г. Алфавит эмоций: словарь-тезаурус эмотивной лексики. Екатеринбург; Москва : Кабинетный ученый, 2020. 431 с.
История русской лексикографии / отв. ред. Ф.П. Сороколетов. СПб. : Наука, 1998. 610 с.
Кибрик А.А. Мультимодальная лингвистика // Когнитивные исследования: сборник научных трудов. Вып. 4. М. : Издательство Института психологии РАН, 2010. С. 135-152.
Ирисханова О.К. Полимодальный дискурс как объект исследования // Полимодальные измерения дискурса / отв. ред. О.К. Ирисханова. М. : ЯСК, 2021. С. 15-33.
The Routledge Handbook of Multimodal Analysis / ed. by С. Jewitt. Routledge handbooks, 2016. 527 p.
Pan B., Hirota K., Jia Z., Dai Y. A review of multimodal emotion recognition from datasets, preprocessing, features, and fusion methods // Neurocomputing. 2023. V. 561. P. 126866. doi: 10.1016/j.neucom.2023.126866.
Mlakar I., Kacic Z., Rojc M. A Corpus for investigating the multimodal nature of multi-speaker spontaneous conversations - EVA Corpus // WSEAS Transactions on Information Science and Applications. 2017. V. 14. P. 213-226.
Das A., Sarma M.S., Hoque M.M., Siddique N., Dewan M.A.A. AVaTER: Fusing Audio, Visual, and Textual Modalities Using Cross-Modal Attention for Emotion Recognition // Sensors (Basel). 2024. V. 24 (18). P. 5862. doi: 10.3390/s24185862. PMID: 39338607; PMCID: PMC11436096.
Perepelkina O., Kazimirova E., Konstantinova M. RAMAS: Russian Multimodal Corpus of Dyadic Interaction for studying emotion recognition: PeerJ Preprints. 2018. V. 6. doi: 10.7287/PEERJ.PREPRINTS.26688V1.
Poria S., Hazarika D., Majumder N., Naik G., Cambria E., Mihalcea R. Meld: A multimodal multi-party dataset for emotion recognition in conversations // arXiv. 2018. doi: 10.48550/arXiv.1810.02508.
Zadeh A.B., Liang P.P., Poria S.E., Morency L.P. Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph // Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018. V. 1. P. 2236-2246.
Zhao J., Zhang T., Hu J., Liu Y., Jin Q., Wang X., Li H. Multi-modal multi-scene multi-label emotional dialogue database // arXiv. 2022. doi: 10.48550/arXiv. 2205.10237.
Zhalehpour S., Onder O., Akhtar Z., Erdem C.E. BAUM-1: A Spontaneous AudioVisual Face Database of Affective and Mental States // IEEE Transactions on Affective Computing. 2017. V. 8 (3). P. 300-313.
Parada-Cabaleiro E., Costantini G., Batliner A., Schmitt M., Schuller B.W. DEMoS - An Italian Emotional Speech Corpus - Elicitation methods, machine learning, and perception // Language Resources and Evaluation. 2020. V. 54. P. 341383.
Kotov A., Budyanskaya E. The Russian emotional corpus: communication in natural emotional situations // Компьютерная лингвистика и интеллектуальные технологии: материалы междунар. конф. «Диалог». Вып. 11(18): в 2 т. Т. 1: Основная программа конференции. М. : Изд-во РГГУ, 2012. C. 296-307.
Ekman P., Davidson R.J. The nature of emotion: Fundamental questions. Oxford : Oxford University Press, 1994. 496 p.
Tomkins S.S. PAT Interpretation: Scope and Technique. New York : Springer Publishing, 1959. 18 p.
Plutchik R. The Nature of Emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice // American Scientist. 2001. V. 89, No. 4. P. 344-350.
Изард К.Э. Психология эмоций. СПб. : Питер, 2006. 464 c.
Wang F., Yu J., and Xia R. Generative Emotion Cause Triplet Extraction in Conversations with Commonsense Knowledge // Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. P. 3952-3963.
Ringeval F., Sonderegger A., Sauer J., Lalanne D.Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions // 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013. 2013. P. 1-8.
Busso C., Bulut M., Lee C., Kazemzadeh A., Mower E., Kim S., Chang J.N., Lee S., Narayanan S. IEMOCAP: interactive emotional dyadic motion capture database // Language Resources and Evaluation. 2008. V. 42 (4). P. 335-359.
Tzirakis P., Zafeiriou S., Schuller B. Real-world automatic continuous affect recognition from audiovisual signals // Computer Vision and Pattern Recognition, Multimodal Behavior Analysis in the Wild / Ed. by X. Alameda-Pineda, E. Ricci, N. Sebe. Academic Press, 2019. P. 387-406.
Hayes A.F., Krippendorff K. Answering the call for a standard reliability measure for coding data // Communication Methods and Measures. 2007. V. 1. P. 77-89.
Mills C., D’Mello S. On the validity of the autobiographical emotional memory task for emotion induction // PLoS One. 2014. V. 9 (4). pmid:24776697.
Tkachenko M., Malyuk M., Holmanyuk A., Liubimov N. Label Studio: Data labeling software. 2020. Available from: https://github.com/heartexlabs/label-studio.
Старостина Е.Г., Тэйлор Г.Д., Квилти Л.К., Бобров А.Е., Мошняга Е.Н., Пузырева Н.В., Боброва М.А., Ивашкина М.Г., Кривчикова М.Н., Шаврикова Е.П., Бэгби М. Торонтская шкала алекситимии (20 пунктов): валидизация русскоязычной версии на выборке терапевтических больных // Социальная и клиническая психиатрия. 2010. № 20(4). С. 31-38.
Kondratenko V., Sokolov A., Karpov N., Kutuzov O., Savushkin N., Minkin F. Large Raw Emotional Dataset with Aggregation Mechanism (Version 1) // arXiv. 2022. doi: 10.48550/ARXIV.2212.12266.