Speech disfluencies modeling in automatic speech recognition systems | Vestnik Tomskogo gosudarstvennogo universiteta – Tomsk State University Journal. 2012. № 363.

Speech disfluencies modeling in automatic speech recognition systems

In this paper, the authors deal with the problem of speech disfluencies analysis for automatic speech recognition. The originof speech disfluencies may be of different nature: it may be caused by external influence or by internal failure in the planning of speechact. Failures in the speech act planning may be various, among speech disfluencies one may name such as filled pauses, self-repairs andstipulations. Such disfluencies are an obstacle for automatic processing of speech and its transcriptions. Speech corpora with RichTranscription (the transcription where such phenomena as sentence boundaries, fillers, disfluencies are marked) are used for studyingspeech disfluencies. Among such corpora are Czech Broadcast Conversation MDE Transcripts and SWITCHBOARD. It is still unclearwhat knowledge should be used in speech recognition systems to classify and elicit speech disfluencies. That is why there are noappropriate models of them, which could provide automatic disfluencies processing. Methods for such processing may be distinguishedbetween those dealing with disfluencies by means of acoustic models and by means of combined models (acoustic and language ones).But owing to objective reasons (time and expert expenses) researchers frequently use only acoustic modeling in speech recognitionsystems. There are a lot of papers describing modeling of speech disfluencies as a part of ASR systems. There is also a group ofapproaches that are meant for speech recognition accuracy increase by separating disfluencies from speech signal in advance or bymeans of speech transcriptions. Among possible approaches to deal with these phenomena in ASR systems there are those that allowmodeling and eliciting disfluencies as separate verbal and paralinguistic elements, and those that ignore them only distinguishing fromuseful speech, but not telling one from another. There is an alternative method of processing disfluencies as part of language modelingand modeling of unknown words: speech disfluencies may be treated as Unknown Words class, and then building a language modelwith the account of these phenomena. For the Russian language there are no methods developed for speech disfluencies processing, so itis worth trying to apply different methods and compare results. Due to high expenses of making a corpus of transcripts, which wouldaccount for speech disfluencies and would be suitable for training language model (at least 3-gram model), speech disfluenciesprocessing with parametric methods seems to be optimal.

Download file

Counter downloads: 387

Keywords

речевые сбои, автоматическое распознавание речи, анализ речи, speech disfluencies, automatic speech recognition, speech analysis

Authors

Name	Organization	E-mail
Verkhodanova Vasilisa O.	Saint-Petersburg Institute for Informatics and Automation of Russian Academy ofSciences	interiora@gmail.com
Karpov Alexey A.	Saint-Petersburg Institute for Informatics and Automation of Russian Academy ofSciences; St. Petersburg State University	karpov@iias.spb.su

Всего: 2

References

Кипяткова И.С., Карпов А.А. Аналитический обзор систем распознавания русской речи с большим словарем // Труды СПИИРАН. 2010. Вып. 12. С. 7-20.

Карпов А., Ронжин А., Лобанов Б. и др. Разработка бимодальной системы аудиовизуального распознавания русской речи // Информационно- измерительные и управляющие системы. 2008. Т. 6, № 10. С. 58-62.

Wendell A.L. Johnson (1906-1965) Memorial Home Page. URL: http://www.uiowa.edu~cyberlaw/oldinav/wjhome.html (дата обращения: 20.03.2012).

Eisler F.G. Psycholinguistics: Experiments in Spontaneous Speech. Academic Press Inc, 1968. 169 p.

In Memoriam: George F. Mahl. Yale Bulletin & Calendar. 2006. March 24. Vol. 34, № 23. URL: http://www.yale.edu/opa/arcybc/ v34.n23/story13.html (дата обращения: 20.03.2012).

Proceedings of DiSS'03, Disfluency in Spontaneous Speech Workshop // Gothenburg Papers in Theoretical Linguistics 90 / ed. by Robert Eklund. Sweden : Goteborg University, 2003. 5-8 September. Р. 3-4.

Колшанский Г.В. Паралингвистика. М., 1974. 81 с.

Николаева Т.М. Паралингвистика // Лингвистический энциклопедический словарь / под ред. В.Н. Ярцевой. М. : Советская энциклопедия, 1990.

Подлесская В.И., Кибрик А.А. Самоисправления говорящего и другие типы речевых сбоев как объект аннотирования в корпусах устной речи // Научно-техническая информация. Сер. 2. 2007. № 2. С. 2-23.

Лауринавичюте А.К., Федорова О.В. Влияние паузы хезитации на понимание синтаксической структуры предложения носителями русского языка // Материалы международной конференции «Диалог 2010». Бекасово, 2010. С. 279-284.

Herbert H. Clark, Jean E. Fox Tree. Using uh and um in spontaneous speaking // Cognition. 2002. Vol. 84. Р. 73-111.

Андреева С.В. Лингвистические закономерности передачи информации при автоматической обработке речи // Материалы Третьего междисциплинарного семинара «Анализ разговорной русской речи». СПб. : СПбГУАП, 2009. C. 10-14.

Сморгонская Е.В. Психолингвистическая дифференциация и классификация речевых сбоев // Вестник ВГУ. Сер. Лингвистика и межкультурная коммуникация. 2008. № 3. С. 140-142.

Shriberg E.E. Preliminaries to a Theory of Speech Disfluencies. PhD thesis, University of California at Berkeley, 1994. 225 p.

Levelt W.J.M. Monitoring and self-repair in speech // Cognition. 1983. Vol. 14. Р. 41-104.

Nakatani C.H., Hirschberg J. A corpus-based study of repair cues in spontaneous speech // Journal of the Acoustical Society of America. 1994. № 95 (3). Р. 1603-1616.

Liu Y. Structural Event Detection for Rich Transcription of Speech, PhD thesis. Berkeley : Purdue University and ICSI, 2004. 253 p.

Пилипенко В.В., Ладошко О.Н. Аннотация и учет речевых сбоев в задаче автоматического распознавания спонтанной украинской речи // Искусственный интеллект. 2010. № 3. C. 238-248.

Кипяткова И.С., Верходанова В.О., Ронжин А.Л. Сегментация паралингвистических фонационных явлений в спонтанной русской речи // Вестник Пермского университета. Российская и зарубежная филология. 2012. Вып. 2 (18). С. 17-23.

Корпус «Czech Broadcast Conversation MDE Transcripts» // Каталог LDC. URL: http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId= LDC2009T20 (дата обращения: 16.06.2012).

Корпус «Czech Broadcast Conversation Speech» // Каталог LDC. URL: http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId= LDC2009S02 (дата обращения: 16.06.2012).

Kolař J., Švec J., Strassel S. et al. Czech Spontaneous Speech Corpus with Structural Metadata // In Proc. INTERSPEECH 2005. Lisbon, Portugal, 2005. Р. 1165-1168.

SWITCHBOARD: A User's Manual. URL: http://www.ldc.upenn.edu/Catalog/readme_files/switchboard.readme.html (дата обращения: 20.06.2012).

Корпус «RT-03 MDE Training Data Text and Annotations» // Каталог LDC. URL: http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId= LDC2004T12 (дата обращения: 20.06.2012).

Skrelin P., Volskaya N., Kocharov D. et al. A Fully Annotated Corpus of Russian Speech // In Proc. of the Seventh conference on International Language Resources and Evaluation (LREC'10). Valletta, Malta, 2010. Р. 109-112.

Skrelin P., Kocharov D. Russian Speech Corpora Framework for Linguistic Purposes // In Proc. of the Seventh conference on International Language Resources and Evaluation (LREC'12). Istambul, Turkey, 2012. Р. 43-46.

Кожевникова Кв. О смысловом строении спонтанной устной речи // Новое в зарубежной лингвистике. Вып. XV: Современная зарубежная русистика. М., 1985. С. 512-524.

Masataka G., Katunobu I., Satoru H. A real-time filled pause detection system for spontaneous speech Recognition // In Proc. of the 6th European Conference on Speech Communication and Technology (Eurospeech '99). Budapest, Hungary, 1999. Р. 227-230.

Liu Y., Shriberg E., Stolcke A. Automatic Disfluency Identification in Conversational Speech Multiple Knowledge Sources // In Proc. of the EUROSPEECH 2003. Geneva, Switzerland, 2003. Р. 957-960.

Liu Y., Shriberg E., Stolcke A. et al. Enriching Speech Recognition with Automatic Detection of Sentence Boundaries and Disfluencies // IEEE Trans. Audio, Speech and Language Processing. 2006. № 14(5). Р. 1526-1540.

Kaushik M., Trinkle M., Hashemi-Sakhtsari A. Automatic Detection and Removal of Disfluencies from Spontaneous Speech // In Proc. of the Proceedings of the Thirteenth Australasian International Conference on Speech Science and Technology (SST). Melbourne,

Snover M., Dorr B., Schwartz R. A lexically-driven algorithm for disfluency detection // In Proc. of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics 2004 (HLT-NAACL-Short '04). Boston

Veiga A., Candeias S., Lopes C., Perdigao F. Characterization of hesitations using acoustic models // In Proc. of the 17th International Congress of Phonetic Sciences (ICPhS XVII). Hong Kong, China, 2011. Р. 2054-2057.

Lease M., Johnson M., Charniak E. Recognizing disfluencies in conversational speech // In Audio, Speech, and Language Processing, IEEE Transactions on. 2006. Vol. 14, № 5. Р. 1566-1573.

Kemp T., Jusek A. Modelling Unknown Words in Spontaneous Speech // In Proc. Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP- 96). Atlanta, 1996. Р. 530-533.

Korenevsky M., Bulusheva A., Levin K. Unknown Words Modeling in Training and Using Language Models for Russian LVCRS System // In Proc. of the International Conference on Speech and Computer (SPECOM'11). Kazan, Russia, 2011. Р. 144-150.