Tomsk dialectal corpora: the starting point
The dialectal archive collected by the dialectologists of Tomsk State University consists of 1500copybooks with the records of regional variants of speech, representing the dialects of Russian oldtimersin the Middle-Ob region. As a unique memorial of the national speech culture, this data collectionmakes a precious source for those who research the history and current state of the dialectal formof the Russian national language. Thus, it is a matter of urgency to design dialectal corpora based onthe Middle-Ob dialects. The methods of corpus linguistics make it possible to take Tomsk dialectalschool up to the next level of research in accordance with modern technologies.To solve this task, in 2010 a group of dialectologists and programmers of Tomsk State University(E.A. Yurina, V.V. Poddubny, S.V. Voloshina, M.A. Tolstova, O.G. Shevelev) worked out a project ofthe Middle-Ob dialectal corpora. The project is expected to result in dialectal corpora available for awide range of researchers, which lately can be integrated as sub corpora into larger national projects.Within the project paper-based dialectal texts will be translated into the digital form, for whichsake the research team members have developed a Unicode-based system of graphical symbols toreproduce peculiarities of dialectal phonetics and morphology. For the purpose of meta-tagging theyhave determined the parameters of passportization, theme qualification and genre specification oftexts. The passport of a text includes data about informants, the time and place of the recording, thenumber of the copybook. The theme section specifies the time of the event (1920s, 1930s, 1940s, etc.)and the theme of the record by means of the key concept determining the semantic unity of the text(War, Family, Kitchen-garden, Kolkhoz, House, etc.). The genre characteristic of the text is given bymeans of the text genre in accordance with the type of speech and narration on the whole: Biography,Narrative, Description, Interview, Fairy-tale, Song, Chastooshka (two-line or four-line humorous folksong), Proverb. The genre qualification ends with the indication to the text subgenre: informative (reportingabout an event, expressing intention, opinion, etc.); imperative (request, order, instruction, etc);ritual (greeting, farewell, apology, etc.); evaluating (approval, disapproval, etc.).In the nearest time the research team plans to make a compiled electronic dictionary of the MiddleOb dialects and to design a software platform of the corpora with the functions of automated text tagging,linguistic database storage, multi-aspect search for words and phrases by set-up parameters, statisticaldata processing and saving search results.
Keywords
Томский диалектный корпус, параметры метаразметки, говоры Среднего Приобья, Tomsk dialectal corpora, meta-tagging parameters, the Middle-Ob dialectsAuthors
Name | Organization | |
Yurina Yelena A. | Tomsk State University | yourina2007@yandex.ru |
References
