Text Incoherence, or Some Pitfalls of Automatic Text Processing
The article discusses the little-studied aspects of analysis of framework expressions, i.e. units that divide information into semantic blocks interpreted according to certain criteria (temporal, spatial, or communicative) set by framework expressions thus ensuring text coherence. The analysis is based on examples from the Russian National Corpus. Unlike most studies using the ascending approach to text coherence, i.e. the integration of minimal discourse units higher-level units, such as in the Rhetorical Structure Theory, this article uses a descending approach, analyzing, on the contrary, the segmentation of the text into smaller units. This approach proves its productivity by allowing to show that texts have not only signals of coherence, but also signals of discreteness, warning that there is no direct connection between the previous and the subsequent context or that there is no such connection at all. Frame expressions function as those signals. The article, without claiming to be exhaustive, raises questions that arise while describing framework expressions; the author gives answers to some of them. First of all, semantic and functional properties of framework expressions are described: 1) weak syntactic dependence on the predication (peripheral syntactic position), 2) topic status, 3) certain semantics features. The author then analyzes how scopes of frame expressions interact and gives a number of possible configurations: 1) the state of affairs q is integrated into the U frame that opens with the state of affairs p and remains open; 2) the state of affairs q is integrated into the U2 frame that closes the U frame; 3) the state of affairs q is integrated into the U2, a frame that is subordinate to the Ui frame. Finally, the author analyzes the interaction of frame expressions with connectives as logical-semantic relations markers. This interaction is manifested in the fact that the frame defines the boundaries of text spans between which this relation is established. The final section shows, using four resources as examples (Penn Discourse Treebank, Supra-corpora database of connectives, RST Discourse Treebank, ANNODIS) the way in which frame expressions are used in the text annotation process. Establishing the heterogeneity of the linguistic units ensuring text coherence, the author concludes that each category of these units should be annotated separately and only in that case the mechanisms of their interaction can be shown. The results obtained can be used to study the discourse structure and in text annotation.
Keywords
text coherence, semantics, automatic text processing, rhetorical relations, frame expressionsAuthors
Name | Organization | |
Inkova Olga Yu. | University of Geneva; Federal Research Center "Informatics and Management" of the Russian Academy of Sciences | olga.inkova@unige.ch |
References

Text Incoherence, or Some Pitfalls of Automatic Text Processing | Vestnik Tomskogo gosudarstvennogo universiteta. Filologiya – Tomsk State University Journal of Philology. 2021. № 74. DOI: 10.17223/19986645/74/5