Text Incoherence, or Some Pitfalls of Automatic Text Processing | Vestnik Tomskogo gosudarstvennogo universiteta. Filologiya – Tomsk State University Journal of Philology. 2021. № 74. DOI: 10.17223/19986645/74/5

Text Incoherence, or Some Pitfalls of Automatic Text Processing

The article discusses the little-studied aspects of analysis of framework expressions, i.e. units that divide information into semantic blocks interpreted according to certain criteria (temporal, spatial, or communicative) set by framework expressions thus ensuring text coherence. The analysis is based on examples from the Russian National Corpus. Unlike most studies using the ascending approach to text coherence, i.e. the integration of minimal discourse units higher-level units, such as in the Rhetorical Structure Theory, this article uses a descending approach, analyzing, on the contrary, the segmentation of the text into smaller units. This approach proves its productivity by allowing to show that texts have not only signals of coherence, but also signals of discreteness, warning that there is no direct connection between the previous and the subsequent context or that there is no such connection at all. Frame expressions function as those signals. The article, without claiming to be exhaustive, raises questions that arise while describing framework expressions; the author gives answers to some of them. First of all, semantic and functional properties of framework expressions are described: 1) weak syntactic dependence on the predication (peripheral syntactic position), 2) topic status, 3) certain semantics features. The author then analyzes how scopes of frame expressions interact and gives a number of possible configurations: 1) the state of affairs q is integrated into the U frame that opens with the state of affairs p and remains open; 2) the state of affairs q is integrated into the U2 frame that closes the U frame; 3) the state of affairs q is integrated into the U2, a frame that is subordinate to the Ui frame. Finally, the author analyzes the interaction of frame expressions with connectives as logical-semantic relations markers. This interaction is manifested in the fact that the frame defines the boundaries of text spans between which this relation is established. The final section shows, using four resources as examples (Penn Discourse Treebank, Supra-corpora database of connectives, RST Discourse Treebank, ANNODIS) the way in which frame expressions are used in the text annotation process. Establishing the heterogeneity of the linguistic units ensuring text coherence, the author concludes that each category of these units should be annotated separately and only in that case the mechanisms of their interaction can be shown. The results obtained can be used to study the discourse structure and in text annotation.

Download file
Counter downloads: 29

Keywords

text coherence, semantics, automatic text processing, rhetorical relations, frame expressions

Authors

NameOrganizationE-mail
Inkova Olga Yu.University of Geneva; Federal Research Center "Informatics and Management" of the Russian Academy of Sciencesolga.inkova@unige.ch
Всего: 1

References

Mann W.C., Thompson S.A. Rhetorical Structure Theory: Towards a Functional Theory of Text Organization // Text & Talk. 1988. № 8 (3). P. 243-281.
Hobbs J.R. Coherence and Coreference // Cognitive science. 1979. № 3. P. 67-90.
Webber B., Prasad R., Lee A., Joshi A. The Penn Discourse Treebank 3.0. Annotation Manual. 2019. URL: https://catalog.ldc.upenn.edu/docs/LDC2019T05/PDTB3-Annotation-Manual.pdf.
PDTB Research Group. The Penn Discourse Treebank 2.0 Annotation Manual. Technical Report IRCS-08-01. Philadelphia : Institute for Research in Cognitive Science, University of Pennsylvania, 2008. URL: https://www.seas.upenn.edu/~pdtb/PDTBAPI/pdtb-annotation-manual.pdf
Инькова О. Логико-семантические отношения: проблемы классификации // Инькова О., Манзотти Э. Связность текста: мереологические логико-семантические отношения. М., 2019. С. 11-98.
Adam J.-M. ed. Faire texte. Frontieres textuelles et operations de textualisation. Besanjon : Presses universitaires de Franche-Comte, 2015. 356 p.
Кибрик А.А. Дискурс // Кибрик А.Е. и др. Введение в науку о языке. М., 2019. С. 126-163.
Dijk T.A. van. Text grammar and text logic // Studies in Text Grammar / eds by J. Petofi, H. Rieser. Dordrecht : Reidel, 1973. P. 17-76.
Martin R. La logique du sens. Paris : PUF, 1983.
Fauconnier G. Espaces mentaux. Paris : Minuit, 1984.
Kamp H., Reyle U. From Discourse to Logic. Dordrecht : Kluwer, 1993.
Charolles M. L’encadrement du discours // Cahier de Recherche Linguistique. 1997. № 6. P. 1-73. URL: https://hal.archivesouvertes.fr/hal-00665849
Национальный корпус русского языка. URL: www.ruscorpora.ru
Sarda L., Carter-Thomas Sh., Fagard B., Charolles M. Adverbials in Use. From Predicative to Discourse Functions. Louvain-La-Neuve : Mons, Presses Universitaires de Louvain, 2014.
Шведова Н.Ю. Детерминирующий объект и детерминирующее обстоятельство как самостоятельные распространители предложения // Вопросы языкознания. 1964. № 6. С. 77-93.
Carlson L., Marcu D., Okurovsky M.E. RST Discourse Treebank. Philadelphia, Linguistic Data Consortium. 2002. ULR: https://catalog.ldc.upenn.edu/LDC2002T07
Carlson L., Marcu D., Okurovsky M.E. Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory // Proceedings of the Second SIGdial Workshop on Discourse and Dialogue. 2001. URL: https://www.aclweb.org/anthology/W01-1605
Asher N., Lascarides A. Logics of conversation. Cambridge : Cambridge University Press, 2003.
Charolles M., Le Draoulec A., Pery-Woodley, Sarda L. Temporal and spatial dimensions of discourse organization // Journal of French Language Studies. 2005. Vol. 15. P. 203218.
Vieu L., Bras M., Asher N., Aurnague M. Locating adverbials in discourse // Journal of French Language Studies. 2005. Vol. 15. P. 173-193.
Prevot L., Vieu L., Asher N. Une formalisation plus precise pour une annotation moins confuse : la relation d’elaboration d’entite // Journal of French Language Studies. 2009. Vol. 19. P. 207-228.
Muller P., Vergez M., Prevot L., Asher N., Benamara F., Bras M., Le Draoulec A., Vieu L. Manuel d’annotation en relations de discours du projet ANNODIS. (Carnets de grammaire, Rapport n°21. CLLE-ERSS.) Toulouse : Universite de Toulouse Jean Jaures, 2012.
 Text Incoherence, or Some Pitfalls of Automatic Text Processing | Vestnik Tomskogo gosudarstvennogo universiteta. Filologiya – Tomsk State University Journal of Philology. 2021. № 74. DOI: 10.17223/19986645/74/5

Text Incoherence, or Some Pitfalls of Automatic Text Processing | Vestnik Tomskogo gosudarstvennogo universiteta. Filologiya – Tomsk State University Journal of Philology. 2021. № 74. DOI: 10.17223/19986645/74/5

Download full-text version
Counter downloads: 341